Open navilg opened 2 years ago
Currently Velero restore follows below orders:
Custom Resource Definitions
Mamespaces
StorageClasses
VolumeSnapshotClass
VolumeSnapshotContents
VolumeSnapshots
PersistentVolumes
PersistentVolumeClaims
Secrets
ConfigMaps
ServiceAccounts
LimitRanges
Pods
ReplicaSets
Clusters
ClusterResourceSets
Any resource that are not in the list are restored in alphabet order.
Therefore, in the current case, the Service resource is restored after Ingress CRDs. This finally caused the restore failure.
There is one way to change the order, you can specify --restoreResourcePriorities
option along with Velero server and customize the entire order. For more information, refer here
It doesn't guarantee to success because the service itself may have dependencies. However, this looks like the only workaround with current Velero, it is worthy to try.
The ultimate solution is to introduce a dependency management to Velero restore. However, it is not that easy, because Velero doesn't know what the application controllers do, so it is hard for Velero to tell the dependency. Anyway, we will pay some efforts to investigate it.
Thanks @Lyndon-Li.
There is one way to change the order, you can specify
--restoreResourcePriorities
option along with Velero server and customize the entire order. For more information, refer here It doesn't guarantee to success because the service itself may have dependencies. However, this looks like the only workaround with current Velero, it is worthy to try.
I will give this a try. Can we add any other resources in --restoreResourcePriorities
which are not mentioned in above list. E.g. can we add Service,Ingress in this argument even if I don't see them in above ordered list you mentioned.
The ultimate solution is to introduce a dependency management to Velero restore. However, it is not that easy, because Velero doesn't know what the application controllers do, so it is hard for Velero to tell the dependency. Anyway, we will pay some efforts to investigate it.
Appreciate it. If we can dig into it and come up with a permanent solution to this. Since Nginx Ingress is used in most clusters, it would be affecting the restoration of any specific namespace which has Ingress.
There is an existing issue asking for document the limitation of Velero restore with admission webhook #4847. And also a proposal to solve this kind of problems #4572.
Just add the above information here for reference.
What steps did you take and what happened: [A clear and concise description of what the bug is, and what commands you ran.)
I accidentally deleted an entire namespace. When I tried to restore the namespace from backup, All the resources are getting restored except Ingress. I am using Nginx Ingress v0.47.0.
Command run to restore:
velero create restore restore-v1.8.1 --from-backup v1.8.1 --exclude-namespaces=velero,kube-system --include-cluster-resources
Error logs:
What did you expect to happen:
Ingress should have been restored properly.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
From the logs, I noticed that Ingress is being restored before Services. Nginx-Ingress-admission controller's validatingwebhookconfiguration
ingress-nginx-admission
tries to validate the ingress. To validate the Ingress it make a POST call onhttps://ingress-nginx-controller-admission.ethan.svc:443
which is service DNS. Since serviceis not yet restored when ingresses are getting restored, It fails to reach out to that service DNS and restore fails.Environment:
velero version
): 1.8.1velero client config get features
): features:kubectl version
): v1.20.13/etc/os-release
): Ubuntu 20.04 LTSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.