Open king-jam opened 5 years ago
Thanks for logging this @king-jam. We think we may be able to up the number of workers running for each controller in the restic daemonset and get the desired parallelism here without any material code changes - we'll definitely queue this up, as we're planning on doing a bunch of work with the restic integration over the next release or two.
So if I'm reading the code correctly, we will have to up the number of workers but that won't resolve the issue.
I believe the restic daemonset would be able to do parallel backups but the itemBackup code still gets executed sequentially so the PVs would still get processed in a synchronous sequential way.
I think the solution is to make the itemBackup code concurrent (# of workers) AND the code for PVs concurrent. This handles multiple pods with a single PV attached to each and the case of a single pod with many PVs attached.
Are there any timelines as to when these improvements to restic will be implemented?
Are there any timelines as to when these improvements to restic will be implemented?
+1
Thanks for logging this @king-jam. We think we may be able to up the number of workers running for each controller in the restic daemonset and get the desired parallelism here without any material code changes - we'll definitely queue this up, as we're planning on doing a bunch of work with the restic integration over the next release or two.
@skriss How to up the number of workers running for each controller in the restic daemonset? Is there any arguments?
@duyanghao the # of workers is set at https://github.com/vmware-tanzu/velero/blob/master/pkg/cmd/cli/restic/server.go#L174 and https://github.com/vmware-tanzu/velero/blob/master/pkg/cmd/cli/restic/server.go#L191, but as @king-jam noted, this would only get us parallelism across multiple volumes within a single pod, not parallelism across pods.
Any news on this? I think we are having scaling issues because of the sequential restic backups.
@skriss based on review of some of the issues, my feeling that this may need to be linked with #1653 - what are your thoughts?
@stephbman I think this would be more around improving the performance of a single backup, since it would involve parallelizing the operations within a Velero backup.
Re: technical design, we could consider using a worker pods-like approach here but I'm not sure it's actually necessary; the existing restic daemonset can probably already handle running multiple operations simultaneously, so it'd just be a matter of having Velero trigger them in parallel rather than sequentially.
the code for PVs concurrent
See #4242.
Hello,
Any news about the multi-jobs feature at the same time?
@eleanor-millman any update on feature request?
waiting update
@eleanor-millman any update for restic concurrency? look forward to !
@jiangfoxi there is already a design PR: https://github.com/vmware-tanzu/velero/pull/5510
Describe the solution you'd like When a single request is made to Velero to back up multiple applications/pods (ie: backing up an entire namespace), resources within the backup job are backed up sequentially, rather than backing up all resources in parallel (concurrently). This is an issue when the list of resources contains large PVs, because the backup job takes longer than desired. Want to make the job execute with workers if possible.
Environment:
velero version
): master/1.0.0kubectl version
): master/v1.14/etc/os-release
): Ubuntu/CentOS