Open K-to-the-D opened 3 months ago
Hi, when using pymongoarrow.api.aggregate_arrow_all() it seems to omit columns that would contain only null values.
data = [ {"name": "Charlie", "email": None}, {"name": "Eve", "email": None}, ]
PyMongoArrow result: [{'_id': ObjectId('66a36acc11ce1209ca0bfcf8'), 'name': 'Charlie'}, {'_id': ObjectId('66a36acc11ce1209ca0bfcf9'), 'name': 'Eve'}] PyMongo result: [{'_id': ObjectId('66a36acc11ce1209ca0bfcf8'), 'name': 'Charlie', 'email': None}, {'_id': ObjectId('66a36acc11ce1209ca0bfcf9'), 'name': 'Eve', 'email': None}]
PyMongoArrow result contains field 'name' but is missing field "email".
data = [ {"name": "Charlie", "email": None}, {"name": "Eve", "email": ""}, ]
PyMongoArrow result: [{'_id': ObjectId('66a3689f75fbe1b2bef04931'), 'name': 'Charlie', 'email': None}, {'_id': ObjectId('66a3689f75fbe1b2bef04932'), 'name': 'Eve', 'email': ''}] PyMongo result: [{'_id': ObjectId('66a3689f75fbe1b2bef04931'), 'name': 'Charlie', 'email': None}, {'_id': ObjectId('66a3689f75fbe1b2bef04932'), 'name': 'Eve', 'email': ''}]
PyMongoArrow result contains 'name' and 'email' fields.
from pymongo import MongoClient from pymongoarrow.api import aggregate_arrow_all data = [ {"name": "Charlie", "email": None}, {"name": "Eve", "email": None}, ] # Insert data client = MongoClient("mongodb://localhost:27017/") db = client["my_dummy_database"] collection = db["my_dummy_collection"] collection.insert_many(data) # Retrieve results pipeline = [{"$match": {"email": {"$exists": True}}}] result_arrow = aggregate_arrow_all(collection, pipeline) result_regular = collection.aggregate(pipeline) print("PyMongoArrow result:\n", result_arrow.to_pylist()) print("PyMongo result:\n", list(result_regular))
Thanks for reporting this bug @K-to-the-D@ This has to do with the auto schema, and hopefully straightforward to fix given Arrow's null type
Hi, when using pymongoarrow.api.aggregate_arrow_all() it seems to omit columns that would contain only null values.
Field "email" with None only
PyMongoArrow result contains field 'name' but is missing field "email".
Field "email" with None and empty string
PyMongoArrow result contains 'name' and 'email' fields.
Code used for this example: