mongodb-labs / python-bsonjs

A fast BSON to MongoDB Extended JSON converter for Python - This Repository is NOT a supported MongoDB product
Apache License 2.0
40 stars 10 forks source link

Converting Audit bson file to JSON #27

Open myuconnect opened 3 years ago

myuconnect commented 3 years ago

Hi Shane,

We have a requirment to process all audit bson file of Mongo database and store it at a centralized location for reporting. In our current process, we scan all the audit bson file convert it to json and send the json file to be persisted at centralized location via REST API call. Following is a snippet of code...

**from bson.json_util import loads, dumps, DEFAULT_JSON_OPTIONS from bson import decode_all

if not self.util.isFileExists(auditFile): return self.util.buildResponse(self.Globals.unsuccess, f"file {auditFile} is missing ")

myAuditFileSize = self.util.getFileSizeBytes(auditFile) / (1024*1024)

if myAuditFileSize > self.BSON_FILE_SIZE_LIMIT_MB: print(f"Audit bson file '{auditFile}' size is larger than {self.BSON_FILE_SIZE_LIMIT_MB}") return self.util.buildResponse(self.Globals.unsuccess, f"Audit bson file '{auditFile}' size is larger than {self.BSON_FILE_SIZE_LIMIT_MB}MB ")

3. processing - converting bson to json

try: if self.util.getFileExtn(auditFile).lower() == "json": myMongoAuditData = self.util.readJsonFile(auditFile) else: with open(auditFile, 'rb') as file: myMongoAuditData = decode_all(file.read())

return myMongoAuditData**

We are facing issue on processing larger bson file thus restricting the size of audit bson file which will be processed. I need your help to use "bsonjs" module to process the audit bson file to generate the json file (will be better to generate smaller json file).

Pls assist.

Thanks,

Anil Kumar

ShaneHarvey commented 3 years ago

We are facing issue on processing larger bson file thus restricting the size of audit bson file which will be processed. I need your help to use "bsonjs" module to process the audit bson file to generate the json file (will be better to generate smaller json file).

Can you describe the issue you're facing? What does the "audit bson file" look like? Does it contain many small documents, many large documents, or a single large document?

Have you tried using bson.decode_file_iter() from pymongo? This method decodes a bson stream file without needing to read the entire file at once.

with open(auditFile, 'rb') as file:
    for doc in bson.decode_file_iter(file):   # Iterate over all the documents in the file
        print(doc)
myuconnect commented 3 years ago

Shane,

Thanks for your response, our audit bson file is huge around 2gb, I was wondering if there is a way to use bsonjs to iter over the bson document while converting it to json as we can do it in bson.decode_file_iter

Thanks,

Anil

ShaneHarvey commented 3 years ago

There is no decode_file_iter equivalent in bsonjs yet. We could add one or you could implement it yourself with some reading of the BSON format (see http://bsonspec.org/spec.html). Check out the decode_file_iter source from pymongo: https://github.com/mongodb/mongo-python-driver/blob/3.11.3/bson/__init__.py#L1135-L1161

Or you could try using bson.decode_file_iter() from pymongo instead of using bsonjs.