Open jbmassicotte opened 4 years ago
We figured out our problem: our cluster is composed of 3 nodepools, the default, plus let’s say pool A and B. We have 2 applications, let’s say X and Y, and use ‘tolerations’ to force app X on nodepool A, and app Y on nodepool B. Because restic uses no toleration, it runs on default nodepool and fails to backup volumes from applications running on pool A and B.
To fix the problem (temporarily), I used kubectl edit daemonset/restic -n velero
to add the needed toleration, which forced restic to run on all cluster nodes. Subsequent backups worked.
Questions to the Velero experts: I need to make these changes permanent. How can I provide these changes to the ‘velero install’ command? Is there a way to provide a daemonset-restic.yaml file to ‘velero install’, and if so, where can I find the default file which I will use to add the toleration config?
I ended up writing a script to capture the daemonset yaml config, to add the toleration to this config via a sequence of sed updates, and to invoke ‘kubectl replace’ with the updated config. It does the trick but I find that somewhat cheesy. Any solution deemed more elegant and reliable would be appreciated.
@jbmassicotte In case you can use the velero helm chart instead, it is possible to specify tolerations for the daemonset there https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/values.yaml#L269.
As this tripped me off a bit, when trying to do a restic backup on a pod that was running on a node where no restic daemon was running I think it would be good behavior if the backup would raise an error or at least show a warning in the velero logs in this situation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Should this be unstaled, as it already has been marked as valuable?
Backup stuck using restic with out any clue, InProgress status. velero install has no option for tolerations.
Thanks to @jbmassicotte Manual editing works
kubectl edit daemonset/restic -n velero
Eg
tolerations:
- key: cpu
operator: Equal
value: mydb
effect: NoSchedule
Edited this earlier post of mine given the more recent info I’ve gathered.
What I did
velero install \ --provider azure \ --plugins velero/velero-plugin-for-microsoft-azure:v1.1.0 \ --bucket $BLOB_CONTAINER \ --secret-file $CREDENTIAL_FILE \ --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT \ --snapshot-location-config apiTimeout=$API_TIMEOUT,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP \ --use-restic
$ kubectl version --short Client Version: v1.15.10 Server Version: v1.17.9 $ velero client config get features features:
$ velero version
Client:
Version: v1.4.2
Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
Server:
Version: v1.4.2