intake / akimbo

For when your data won't fit in your dataframe
https://akimbo.readthedocs.io
BSD 3-Clause "New" or "Revised" License
21 stars 6 forks source link

migrate to awkward-dataframe #51

Open martindurant opened 3 months ago

martindurant commented 3 months ago

This captures my conversation with @jpivarski

Since we have an integration with polars (#41 ) and coming integration with cuDF (#50 ) and maybe even daft (for distribution on Ray, where at least map-partition or map-row operations would be simple), the scope of this library has changed and should be renamed.

We might require some logic at import to determine which accessors to register/attach depending on what is installed (or require explicit imports of relevant sub-modules).

This also suggests that the effort put into the "awkward" dtype and extension array for pandas was not necessary, and the only thing that's important is the accessor; after all, any pandas column has a to-arrow method which is implemented for all builtin types (including actual arrow columns) and probably any extension types we might care about. This change would also make the code for each integration very similar.

All this would amount to awkward being the identical nested/var-length API across several dataframe types, and bring fast numba vectorised functionality too. It also brings the possibility of attaching interesting "behaviours" (like IP addresses, discussed before, or vectors, or images...) to these dataframe libraries.

Finally, it would be nice to attach the accessor to dataframes rather than just series, corresponding the current pandas to/from columns logic, since a table is exactly the same thing as a record-array.

martindurant commented 3 months ago

I should also add, then when we first considered making this library broader, the name "aktuate" was mentioned, and being more attention-grabbing. Critically, it includes the starting characters "ak" used for all accessors. There are some companies with this name, but no software that I could find. Other possible names were also suggested at the time.