mongodb-labs / mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
https://mongo-arrow.readthedocs.io
Apache License 2.0
92 stars 14 forks source link

List in schema raises #222

Closed lazargugleta closed 4 months ago

lazargugleta commented 5 months ago

Hey,

When struct is put inside a list

schema = pymongoarrow.api.Schema({'_id': bson.ObjectId, 'list': [(pyarrow.struct([('a', pyarrow.int32()), ('b', pyarrow.string())]))]})

it raises

>       raise ValueError(msg)
E       ValueError: Unsupported type identifier <class 'pyarrow.lib.StructType'> for field 0

As the case for lists in the function _normalize_typeid exists, then it seems that it should be supported, but there is a bug.

Expected behavior:

<Schema {'_id': ObjectIdType(FixedSizeBinaryType(fixed_size_binary[12])), 'list': ListType(list<item: struct<a: int32, b: string>>)}>

See #223 for a fix.

Jibola commented 5 months ago

Great find! I've attached the details of this issue to this ongoing JIRA Ticket and have also linked your PR: #223