stripe-archive / mosql

MongoDB → PostgreSQL streaming replication
MIT License
1.63k stars 225 forks source link

Oplog tailing is *significantly* slower than importing. #47

Open braedenp-msft opened 10 years ago

braedenp-msft commented 10 years ago

I have a dozen MongoDB collections that are updated periodically, causing ~80,000 documents to be removed and replaced in each of them.

When I have tailing enabled, this change takes forever (about 3.5 hours) to replicate into PostgreSQL, because tailing occurs at a rate of ~100 documents/sec on the machine I'm running it on.

For comparison, doing --reimport on exactly the same collections runs at ~3000 documents/sec on the same machine. This process finishes in less than 2 minutes.

I'm aware that processing oplog entries introduces some overhead, but it surely shouldn't reduce throughput by a factor of ten. Am I doing something wrong, or is this simply a limitation that exists in the current mosql implementation?

braedenp-msft commented 10 years ago

I finally had some time to look through the source, and the bottleneck is quite obvious:

When doing the initial import, COPY is used to pull the records in through STDIN, which is significantly faster than running multiple INSERTs, or even inserting multiple records in a single INSERT, especially when doing large imports.

When tailing, however, an upsert is executed for each record, incurring 1-2 separate queries for each one.

I put together a patch for my project that consolidates sequential inserts into a COPY operation, if possible (with fall-back to upserts in the case of constraint violations or other errors). For environments with mostly updates, this change doesn't affect performance. For environments where large inserts are common, however, this change makes a huge difference.

If anybody is interested, I can clean it up and put together some tests for merging.

ColmHally commented 10 years ago

Hey @BraedenP I'd be very interested in seeing your changes to insert multiple records in a single INSERT, even if it's rough!

kungfoox commented 9 years ago

Definitely interested. Any thoughts on merging this into mosql?

barretod commented 8 years ago

What happened with this? We really need this optimization, mosql can no longer keep up in our environtment.