jaraco / jaraco.mongodb

MongoDB Utilities including an oplog replay tool
MIT License
5 stars 2 forks source link

Add support for differential document updates #24

Open jaraco opened 5 years ago

jaraco commented 5 years ago

This enhancement is designed to facilitate differential updates of documents from MongoDB. Using this feature, documents loaded from MongoDB will be instrumented to detect changes, and then a function applied to that instrumented object should produce an update object suitable for only enacting the changes to that document. Example:

doc = db.coll.find_one(..., document_class=Differencing))
doc['num'] += 1
doc['new'] = 'new value'
del doc['removed']
instruction = gen_update(doc)
db.coll.update({'_id': doc['_id']}, instruction)

This technique could be used in situations where the document size is very large compared to the updates applied to it. Instead of replacing the entire document, only the differential updates are applied, limiting the pressure on the oplog for such operations.

jaraco commented 5 years ago

I'm started work in the feature/differencing-dict branch, but I've stumbled on a problem. It's not possible to pass the object as the document_class because pymongo constructs the documents with an init/append model, i.e.:

   doc = dict()
   for key, value in parsed:
      doc[key] = value

As a result, if the dict type is a custom type, it always looks like it was constructed empty and had keys added. There's no opportunity after the initial load from bson to indicate that the document is now complete.