influxdata / influx-cli

CLI for managing resources in InfluxDB v2
MIT License
63 stars 24 forks source link

Speed up backup process by downloading multiple Shard Groups in parallel #365

Open TwentyFiveSoftware opened 2 years ago

TwentyFiveSoftware commented 2 years ago

Currently, the backup process downloads one shard at a time from the Influx API and stores it on the file system. This process tends to be very slow on larger databases, as it doesn't take advantage of large IO capacity which could speed up this process tremendously.

This PR introduces a pool of workers downloading a bunch of shards in parallel, split at the layer of shard groups, because a shard group only holds a single shard in the Influx OSS version, which obviously wouldn't make sense to parallelize.

My benchmarked speedup of the parallelization in a VM running on my machine with a limited IO capacity is already 2 to 3 times, but is probably even more on a beefier system.

TwentyFiveSoftware commented 2 years ago

Closes #366