xelalexv / dregsy

Keep container registries in sync
https://buymeacoffee.com/xelalex
Apache License 2.0
260 stars 52 forks source link

Support parallel tasks #64

Open sjthespian opened 2 years ago

sjthespian commented 2 years ago

I'm working on a solution to synchronize images to a set of remote registries, and having the ability to run tasks in parallel would be a huge help. Right now I'm building n different configs and running m instances of dregsy. It would be much cleaner to have a single config will all of my tasks defined and have dregsy handle managing the parallelism.

Something along the lines of:

relay: skopeo

skopeo:
  binary: skopeo
  mode: copy
  # Number of tasks to run in parallel
  parallel_tasks: 4
xelalexv commented 2 years ago

Running tasks in parallel can be advantageous when the system running dregsy has a significantly faster network connection than any of the involved source and/or target registries. In the opposite case, we may not gain much of a speed up, since the parallel tasks would compete for the slow network connection. The same may be observed if there's just one slow source and one slow target. At any rate, having the option to add parallelism to tasks is definitely a good idea.

Implementation thoughts:

sjthespian commented 2 years ago

That would be exactly why I'm doing it -- I'm syncing to registries that are only available via. satellite links. My uplink bandwidth is roughly 10x the bandwidth of each individual registry.

I have a poor-mans working implementation of this in meantime, I'm running this in k8s so I just spin up n pods each with a single task. We don't add new registries often, so I'm not creating too much tech debt.

Validation could be tricky. While it isn't in my use case, I could see someone wanting to have parallel syncs running to the same registry with each sync being it a separate namespace.