I am using msgspec for its high-performance JSON decoding capabilities and have encountered a challenge when integrating it with data retrieved from MongoDB using PyMongo.
As you may know, PyMongo natively returns query results as Python dictionaries, which isn't ideal for performance when converting to classes with msgspec. To maintain high performance, I want to avoid the overhead of PyMongo's dict conversion and directly use msgspec to decode my data into classes.
I've been experimenting with using RawBSONDocument to bypass the automatic dict conversion, but I'm unsure if this is the best approach. Here's an example of the current process I'm using:
from pymongo import MongoClient
from bson.raw_bson import RawBSONDocument
from bson import json_util
import msgspec
client = MongoClient(document_class=RawBSONDocument)
db = client["test"]
new_results = []
class DBResults(msgspec.Struct):
# Define the expected structure here
for doc in db.test.find({}):
# Convert RawBSONDocument to JSON string
json_str = json_util.dumps(doc)
# Encode JSON string to bytes
json_bytes = json_str.encode("utf-8")
# Decode JSON bytes to DBResults class instance
new_results.append(msgspec.json.decode(json_bytes, type=DBResults))
While the above works, it involves converting BSON to a JSON string and then encoding this to bytes, which feels like an unnecessary step and could be a performance bottleneck.
Also, there are other challenges, like the _id field being a dict and datetime fields.
Could you recommend a more efficient way to handle this scenario with msgspec? Is there a direct path from BSON to a msgspec class instance that I might be missing?
Thank you for your time and the excellent work on msgspec.
Question
I am using
msgspec
for its high-performance JSON decoding capabilities and have encountered a challenge when integrating it with data retrieved from MongoDB using PyMongo.As you may know, PyMongo natively returns query results as Python dictionaries, which isn't ideal for performance when converting to classes with
msgspec
. To maintain high performance, I want to avoid the overhead of PyMongo's dict conversion and directly usemsgspec
to decode my data into classes.I've been experimenting with using
RawBSONDocument
to bypass the automatic dict conversion, but I'm unsure if this is the best approach. Here's an example of the current process I'm using:While the above works, it involves converting BSON to a JSON string and then encoding this to bytes, which feels like an unnecessary step and could be a performance bottleneck.
Also, there are other challenges, like the _id field being a dict and datetime fields.
Could you recommend a more efficient way to handle this scenario with msgspec? Is there a direct path from BSON to a msgspec class instance that I might be missing?
Thank you for your time and the excellent work on msgspec.
Best regards,