mongodb-labs / mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
https://mongo-arrow.readthedocs.io
Apache License 2.0
92 stars 14 forks source link

objectID in nested field raises when using aggregate_polars_all #219

Closed sibbiii closed 1 month ago

sibbiii commented 5 months ago

Hi,

I really appreciate the support for polars, but

collection.insert_one({'obj': {'data_to_test': bson.ObjectId()}})
pymongoarrow.api.aggregate_polars_all(collection, [], 
                                      schema=pymongoarrow.api.Schema({'obj': {'data_to_test': bson.ObjectId}}))

raises with polars.exceptions.ComputeError: cannot create series from Extension("pymongoarrow.objectid", FixedSizeBinary(12), Some("")) as the casting of FixedSizeBinary is not done for nested fields.

_Ps.: For not nested fields it works fine. Loading the arrow data frame also works fine. So this is not a showstopper, but it prohibits from using the aggregate_polarsall convenience function.

See #220 for a fix

ShaneHarvey commented 2 months ago

Should this be closed now that #220 is merged?

lazargugleta commented 1 month ago

@ShaneHarvey yes