nebtex / vault-migrator

migrate vault data between different physical backends
Apache License 2.0
53 stars 11 forks source link

Refactor using parallel workers for get/put operations #5

Open EXPEddrewery opened 6 years ago

EXPEddrewery commented 6 years ago
criloz commented 6 years ago

hi @EXPEddrewery, thanks for taking your time and make the pull request, unfortunately I don't believe that I can approve due that there is not much to gain adding go coroutines to the script, the task is bounded by the network io, but if you have a strong argument in favor to adopt coroutines I can reconsider. thanks!!

EXPEddrewery commented 6 years ago

Thanks for replying. I closed this because it wasn't quite ready and also I haven't proven it works 100% yet.

Our use case is a very large Vault backend (900K objects) that we are moving from S3 to DynamoDB.

It's true that networkIO is one restriction, as is the throughput on DynamoDB.

In my initial tests with the original code, it was going to be hours to transfer that many objects.

I did a test with 10 workers and it came down to about an hour and then another test with 64 which was about 15 minutes.

I had to ramp up the DynamoDB throughput to over 5000 to do this and even then there was throttling, but it certainly shortened the change over time for our use case. I also used a C5 instance for maximum bandwidth.

Once I finish testing and I'm happy with it, I'll push the final code and you can decide whether it's worth including.

I've reduced the workers to default to 1 which should be the same level of performance as you have now.

criloz commented 6 years ago

@EXPEddrewery that numbers seem very nice, I think that it is worth 100% them, I would be ready to test it by my self.

EXPEddrewery commented 6 years ago

Hi again. I've tested it a few times now with my S3 to DynamoDB migration and the numbers seem reasonably good.

I don't have any logs to provide but essentially I've left your original logging the same unless you pass the new debug configuration parameter as true.

Below is a screenshot of the metrics from the migration test I did today with 10K write capacity on DynamoDB and 64 workers with our Vault cluster that has approximately 950K objects in S3.

10k_test

It took just over 15 minutes to run on a c5.large and as you can see from the graphs, the it was throttled pretty heavily and never reached the 10K max. We'll be going with a 5K Max and 64 workers for our migration once we plan some downtime.

Let me know if you would like me to add some documentation for the configuration changes.