mongodb-labs / python-bsonjs

A fast BSON to MongoDB Extended JSON converter for Python - This Repository is NOT a supported MongoDB product
Apache License 2.0
40 stars 10 forks source link

New PyPi package release request #29

Closed zyzil closed 2 years ago

zyzil commented 3 years ago

Any chance of getting a new release with the updated libbson that fixes known issues? I've found this library to be necessary for large ELT processes using pymongo and would love to see it continue.

ShaneHarvey commented 3 years ago

Hi @zyzil, thanks for opening this discussion. Are there any issues in particular that are causing problems for you?

My initial plan for the next release in order of importance is to:

By the by, could you briefly describe how you're using python-bsonjs in your ELT process?

zyzil commented 3 years ago

@ShaneHarvey - Sorry for the delay.

As far as issues causing problems, I was specifically looking for a release with an updated libbson which I am hoping will resolve issue #13. I need to ensure full documents from MongoDB are serialized in a compatible way to load them back later.

I would also be interested in the possible switch to MongoDB Extended JSON 2.0. Any improvements to ensure serialization/deserialization compatibility is a win for me.

As far as how I'm using python-bsonjs: I have built an internal singer tap using pymongo to handle some very specific replication use-cases for our data architecture. After the initial implementation, I found that the tap was spending a vast majority of its time serializing documents to JSON (more time than actually fetching data). When I profiled the application I found that the pymongo implementation of keeping SON key orders by deep-copying to be the issue. I was able to reduce CPU utilization in our Kubernetes cluster an order of magnitude by switching to python-bsonjs.

Thanks!

ShaneHarvey commented 2 years ago

@juliusgeo just released a new version yesterday: https://pypi.org/project/python-bsonjs/0.3.0/

You can see what's changed here: https://github.com/mongodb-labs/python-bsonjs/blob/0.3.0/CHANGELOG.rst#030

Note that while we did update libbson and add support for Extended JSON 2.0, we were unable to resolve the JSON list parsing issue in #13. We did however document one workaround in which you can wrap the top-level list in a dictionary: https://github.com/mongodb-labs/python-bsonjs/blob/0.3.0/README.rst#top-level-arrays