ksprojects / zkcopy

ZooKeeper copy utililty
Apache License 2.0
240 stars 96 forks source link

Transaction support #14

Closed tub closed 6 years ago

tub commented 6 years ago

This PR adds transaction support to ZK copy, allowing the user to specify the number of operations in the transaction with -b/--batchSize.

We've tested this with up to 10,000 operations per transaction with 100,000 nodes and seen around a 6x write speedup.

Another optimization for users wanting to perform a copy multiple times is the -m/--mtime flag, which allows zkcopy to ignore nodes older than the specified mtime, skipping any writes that may have been performed by a previous run of zkcopy. To aid this, zkcopy logs the maximum mtime of the copied nodes at runtime.

This PR also refactors the CLI arg parsing to use picocil, removing a lot of manual parsing and type-checking.

kshchepanovskyi commented 6 years ago

Thanks, looks awesome! I'd like to write few tests for current version before merging PR to make sure nothing is broken, using https://www.testcontainers.org. Probably will do it on a weekend.

tub commented 6 years ago

I've added a bunch of JUnit unit-tests around the code we've changed. Not yet looked into the testcontainers setup, but it looks interesting. The unit tests required a bit of refactoring in order to pass dependencies in.

We (myself and @ronin13) have also worked on cleaning up the zookeeper server object once zkcopy is finished, and raised the session timeout so that large transactions and reads don't fail.

tub commented 6 years ago

I've just added an initial testcontainers integration test. Really nice library, thanks for the headsup @kshchepanovskyi The new itest highlighted that the connections to the source ZK were still not being closed so I've scrapped the finalize() implementation and exposed a way of closing the ZooKeepers from the ThreadFactory.