intake / akimbo

For when your data won't fit in your dataframe
https://akimbo.readthedocs.io
BSD 3-Clause "New" or "Revised" License
21 stars 6 forks source link

Add basic hooks for dask dataframe #30

Closed martindurant closed 7 months ago

martindurant commented 1 year ago

This makes some things work with an awkward series in a dask-dataframe.

The hard-coded guess at a two-element array works for simple operations. I played with attaching a real form to the dtype object, but it rapidly gets complicated and towards what dask-awkward does. Which raises the question whether we should plumb in dask-awkward at least when using the accessor, and then we'll know the true dtype of any output via the typetracer. This applies even more to IO: we should use dask-awkard for reading from parquet/JSON rather than have dask.dataframe make objects that we then make back into awkward structures.

One rough edge I already found: if the operation being performed returns a different number of elements than the two supplied by the fake series, you get an error. How much do we need to push users to provide their own meta=? Can we make it for them while we have more information before passing to make_array_nonempty ?