This pull request adds support for batching sequential INSERTs when doing tailing, speeding up tailing under certain conditions while not being slower than the current state. See also issue #47.
r? @nelhage
cc @snoble
Basic strategy is to batch consecutive inserts together per namespace. Batch gets saved whenever:
An update or delete is done to the same namespace as the insert
After streaming (up to) 1000 updates from oplog, time from last batch update is larger than 5 seconds.
More than a threshold of updates have happened in this namespace.
Program is exiting/streaming stops.
Some handwavy measurements for tailing 20000 oplog entries:
Speed is roughly the same on current master and this when alternating between doing inserts and updates. (~350s on local machine)
10 inserts per update: ~4.6x faster (76s on local machine)
20 inserts per update: ~7.4x faster (47s)
50 inserts per update: ~11x faster (32s)
1000 inserts per update: ~31x faster (~11.1s, though probably running into measurement overhead here)
Notes on potential future work (That I may or may not be working on soonish):
The next "low hanging" performance fruit to work on after this would be to optimize updates, though this wouldn't have this large of an effect.
Some ideas on how can be done: $set entries in oplog can directly be translated into postgres queries only updating those columns mentioned. Updates without $set can replace the current row in postgres with the data in oplog entry. Tricky part here is figuring out if/how this applies to tokumx even after mongoriver does oplog entry translation (if they support any other $ operations) and unset.
Another performance improvement would be to have multiple tailers in either separate threads or processes, separated by namespace. This would however keeping multiple tailing states in database (one per namespace) and I'm not quite sure what the performance implications are for mongo for querying the same oplog (with filters?) from multiple processes.
This pull request adds support for batching sequential INSERTs when doing tailing, speeding up tailing under certain conditions while not being slower than the current state. See also issue #47.
r? @nelhage cc @snoble
Basic strategy is to batch consecutive inserts together per namespace. Batch gets saved whenever:
Some handwavy measurements for tailing 20000 oplog entries:
Notes on potential future work (That I may or may not be working on soonish):
The next "low hanging" performance fruit to work on after this would be to optimize updates, though this wouldn't have this large of an effect.
Some ideas on how can be done:
$set
entries in oplog can directly be translated into postgres queries only updating those columns mentioned. Updates without$set
can replace the current row in postgres with the data in oplog entry. Tricky part here is figuring out if/how this applies to tokumx even after mongoriver does oplog entry translation (if they support any other $ operations) and unset.Another performance improvement would be to have multiple tailers in either separate threads or processes, separated by namespace. This would however keeping multiple tailing states in database (one per namespace) and I'm not quite sure what the performance implications are for mongo for querying the same oplog (with filters?) from multiple processes.