intake / akimbo

For when your data won't fit in your dataframe
https://akimbo.readthedocs.io
BSD 3-Clause "New" or "Revised" License
21 stars 6 forks source link

Support for pandas NAType #44

Open jpivarski opened 6 months ago

jpivarski commented 6 months ago

Copied from @xinyuejohn's issue on scikit-hep/awkward#2931:

Description of new feature

Hi, I was creating an awkward array using pandas dataframe and I found awkward doesn't support pandas._libs.missing.NAType()

It would be great if this NAType could be supported.

To replicate:

import awkward as ak
import pandas

a = ["a", "b", pandas._libs.missing.NAType()]
ak.Array(a)

Traceback:

ValueError: cannot convert <NA> (type NAType) to an array element

(https://github.com/scikit-hep/awkward/blob/awkward-cpp-26/awkward-cpp/src/python/content.cpp#L191)

This error occurred while calling

    ak.to_layout(
        ['a', 'b', <NA>]
        allow_record = False
        regulararray = False
        primitive_policy = 'error'
    )

See scikit-hep/awkward#2931 for more information, in particular

martindurant commented 6 months ago

I was creating an awkward array using pandas dataframe

So we should intercept the data before conversion to ak.

Question: does arrow support this NA type? I suspect no, they alway use masks, and so we're only really concerned with 1-D regular series.