mongodb-labs / python-bsonjs

A fast BSON to MongoDB Extended JSON converter for Python - This Repository is NOT a supported MongoDB product
Apache License 2.0
40 stars 10 forks source link

Help using aggregate_raw_results #56

Open martyzz1 opened 7 months ago

martyzz1 commented 7 months ago

I've been going around in circles a bit trying to understand if this library can be used to speed up decoding an aggregation query.

Or whether after recent pymongo updates its needed at all.

documents = []

cursor = collection.aggregate_raw_batches(
                              pipeline=aggregation_query,
)
while True:
    try:
        documents.extend([x for x in decode_all(cursor.next())])
    except StopIteration:
        break

How would I use bsonjs.dumps instead?

ShaneHarvey commented 7 months ago

This library is only useful for converting raw BSON data (eg RawBSONDocument) to MongoDB Extended JSON. If you need the documents to be decoded into Python dict then this library will not help.

Also aggregate_raw_batches is only useful when the app needs a stream of raw BSON data. If you're going to decode_all then it will be more efficient to use a regular aggregate:

documents = list(collection.aggregate(pipeline))
ShaneHarvey commented 7 months ago

For help speeding up your application I suggest posting here: https://www.mongodb.com/community/forums/tag/python

It would help to include more info about the size of the result set (how big is documents?), how long is the query vs the query decoding, what happens to documents, would it be faster to process the documents individually?, etc.