mongodb-labs / mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
https://mongo-arrow.readthedocs.io
Apache License 2.0
92 stars 14 forks source link

Setting partial Schema to find_arrow_all and find_pandas_all #243

Open frbelotto opened 1 month ago

frbelotto commented 1 month ago

Hello guys, I would like to discuss about setting the Schema for find_arrow_all or find_pandas_all. I have a database with several columns, two of them are ObjetctIds that are crashing my code (I´ve reported here ), so, I am trying to import all my table columns but just setting such columns to be imported as strings

schema = Schema({'_id': pa.string(), 'referenciaConversao': pa.string()})
pd_confirmacao_conversao = find_pandas_all(pd_confirmacao_conversao, {'estadoContabilizacaoEvento': {'$lt': 100}}, schema=schema)

My issue here is that, as I only set the schema for those two columns, only those columns are being imported from the dataset! Is there any way to improve it?

aclark4life commented 1 month ago

Hi @frbelotto Is this related to #242 or a separate question?

frbelotto commented 1 month ago

Hi @aclark4life There I am reporting a issue/bug that I am facing. Here I propose an improvement/discussion about partial schema definition (that, in my case, is a workaround I´ve found for my issue!)

aclark4life commented 1 month ago

@frbelotto Ah! OK thanks, we'll track both in INTPYTHON-256