Open myuconnect opened 3 years ago
We are facing issue on processing larger bson file thus restricting the size of audit bson file which will be processed. I need your help to use "bsonjs" module to process the audit bson file to generate the json file (will be better to generate smaller json file).
Can you describe the issue you're facing? What does the "audit bson file" look like? Does it contain many small documents, many large documents, or a single large document?
Have you tried using bson.decode_file_iter() from pymongo? This method decodes a bson stream file without needing to read the entire file at once.
with open(auditFile, 'rb') as file:
for doc in bson.decode_file_iter(file): # Iterate over all the documents in the file
print(doc)
Shane,
Thanks for your response, our audit bson file is huge around 2gb, I was wondering if there is a way to use bsonjs to iter over the bson document while converting it to json as we can do it in bson.decode_file_iter
Thanks,
Anil
There is no decode_file_iter equivalent in bsonjs yet. We could add one or you could implement it yourself with some reading of the BSON format (see http://bsonspec.org/spec.html). Check out the decode_file_iter source from pymongo: https://github.com/mongodb/mongo-python-driver/blob/3.11.3/bson/__init__.py#L1135-L1161
Or you could try using bson.decode_file_iter() from pymongo instead of using bsonjs.
Hi Shane,
We have a requirment to process all audit bson file of Mongo database and store it at a centralized location for reporting. In our current process, we scan all the audit bson file convert it to json and send the json file to be persisted at centralized location via REST API call. Following is a snippet of code...
**from bson.json_util import loads, dumps, DEFAULT_JSON_OPTIONS from bson import decode_all
if not self.util.isFileExists(auditFile): return self.util.buildResponse(self.Globals.unsuccess, f"file {auditFile} is missing ")
myAuditFileSize = self.util.getFileSizeBytes(auditFile) / (1024*1024)
if myAuditFileSize > self.BSON_FILE_SIZE_LIMIT_MB: print(f"Audit bson file '{auditFile}' size is larger than {self.BSON_FILE_SIZE_LIMIT_MB}") return self.util.buildResponse(self.Globals.unsuccess, f"Audit bson file '{auditFile}' size is larger than {self.BSON_FILE_SIZE_LIMIT_MB}MB ")
3. processing - converting bson to json
try: if self.util.getFileExtn(auditFile).lower() == "json": myMongoAuditData = self.util.readJsonFile(auditFile) else: with open(auditFile, 'rb') as file: myMongoAuditData = decode_all(file.read())
return myMongoAuditData**
We are facing issue on processing larger bson file thus restricting the size of audit bson file which will be processed. I need your help to use "bsonjs" module to process the audit bson file to generate the json file (will be better to generate smaller json file).
Pls assist.
Thanks,
Anil Kumar