Closed Sediket closed 3 months ago
dup with #4707 for ownerReference support. And this may also be solved by #7481
Hello!
I'm new to Velero and just trying to test I can do backups and restores, so forgive me if I'm not familiar with too much of the inner-workings.
I was looking at https://github.com/vmware-tanzu/velero/issues/4707 and I believe it's not the same issue. In 4707 the issue is identifying that the ownerreference was not carried over during the restore. The issue I'm identifying is that the ownerreference is present and because of the ownerreference the restore is not working.
Specifically, during the restore, because of the ownerreference, the PVC is auto-deleted by k8s because the owner is not present, in this case it's the satefuleSet because the PVCs are restored before the statefulSet.
For https://github.com/vmware-tanzu/velero/pull/7481 this looks like a restore of only a specified PV and PVC to a dummy resource, then detaching and re-attaching to the running resource, not sure how that would help.
Thanks!
@Sediket
I'm a little confused.
Velero removes the ownerReferences
of an object before it's created:
https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L1299
The error is about restore-monitoring/data-loki-backend-0
but the warning you mentioned in the description is about persistentvolumeclaim/data-loki-backend-1
Could you double check why restore-monitoring/data-loki-backend-0
is not available?
It would be helpful if you could reproduce the problem and collect the debug bundle via velero debug
@reasonerjt Hello! These are the errors I'm getting in the namespace, during the restore:
2m39s Normal Provisioning persistentvolumeclaim/data-loki-backend-1 External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-1"
66s Warning OwnerRefInvalidNamespace persistentvolumeclaim/data-loki-backend-1 ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
66s Normal ExternalProvisioning persistentvolumeclaim/data-loki-backend-1 waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
66s Normal Provisioning persistentvolumeclaim/data-loki-backend-1 External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-1"
4m21s Warning OwnerRefInvalidNamespace persistentvolumeclaim/data-loki-backend-2 ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
4m21s Normal ExternalProvisioning persistentvolumeclaim/data-loki-backend-2 waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
4m21s Normal Provisioning persistentvolumeclaim/data-loki-backend-2 External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-2"
2m41s Warning OwnerRefInvalidNamespace persistentvolumeclaim/data-loki-backend-2 ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
68s Warning OwnerRefInvalidNamespace persistentvolumeclaim/data-loki-backend-2 ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
I'm also doing the configmap to map them to a new storage class: (https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/storageclass-mapping.md) which is working you see the PVCs trying to be created, but they are removed given the owner is not present. Later the statefulSet is created and the PVC doesn't because it failed to create and a new one is created on the default storage class and is empty.
And all my PVCs that don't have an ownerReference are restored correctly.
Is it because when using the change-storage-class-config it doesn't filter out the metadata?:
some more data, watching the PVCs durring a restore, the data-loki-backend PVCs are the ones with the ownerReference and the data-loki-write PVCs don't have an ownerReference.
The data-loki-backend keeps geting terminated as the owner isn't present and the others are being created just fine. At the end the PVCs for data-loki-backend are created dynamically with the default storage class as the statefulSet is being deployed and are empty:
data-loki-backend-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Terminating spectro-storage-class-bind-immediate 0s
data-loki-write-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-write-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-write-1 Pending pvc-19af38e0-3a52-4bb7-b86f-4aeea262ae0a 0 spectro-storage-class-bind-immediate 0s
data-loki-write-1 Bound pvc-19af38e0-3a52-4bb7-b86f-4aeea262ae0a 10Gi RWO spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Terminating spectro-storage-class-bind-immediate 0s
data-loki-write-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-write-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-write-2 Pending pvc-dcc2e856-adff-4ddf-9b5b-502d55b0e4d2 0 spectro-storage-class-bind-immediate 0s
data-loki-write-2 Bound pvc-dcc2e856-adff-4ddf-9b5b-502d55b0e4d2 10Gi RWO spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-2 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Pending spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-1 Terminating spectro-storage-class-bind-immediate 0s
data-loki-backend-0 Pending spectro-storage-class 0s
data-loki-backend-0 Pending spectro-storage-class 0s
data-loki-backend-1 Pending spectro-storage-class 0s
data-loki-backend-1 Pending spectro-storage-class 0s
data-loki-backend-2 Pending spectro-storage-class 0s
data-loki-backend-2 Pending spectro-storage-class 0s
data-loki-backend-0 Pending spectro-storage-class 1s
data-loki-backend-1 Pending spectro-storage-class 1s
data-loki-backend-2 Pending spectro-storage-class 1s
data-loki-backend-0 Pending spectro-storage-class 1s
data-loki-backend-1 Pending spectro-storage-class 1s
data-loki-backend-2 Pending spectro-storage-class 1s
data-loki-backend-1 Pending pvc-91cab97a-729e-4503-9ee8-6e9f2dbb5edd 0 spectro-storage-class 2s
data-loki-backend-1 Bound pvc-91cab97a-729e-4503-9ee8-6e9f2dbb5edd 10Gi RWO spectro-storage-class 2s
data-loki-backend-0 Pending pvc-cbe33462-06db-4ea9-8dc4-3f0f15ccce36 0 spectro-storage-class 2s
data-loki-backend-0 Bound pvc-cbe33462-06db-4ea9-8dc4-3f0f15ccce36 10Gi RWO spectro-storage-class 2s
data-loki-backend-2 Pending pvc-58823323-8f82-4fa6-98bd-b9a594a7aebc 0 spectro-storage-class 2s
data-loki-backend-2 Bound pvc-58823323-8f82-4fa6-98bd-b9a594a7aebc 10Gi RWO spectro-storage-class 2s
Looks like data-loki-backend has been deleted after Velero restore creates it. And the deletion of the PVC is due to the referred object of the ownerReference
doesn't exist as the statefulset has not restored yet.
As the behavior of Velero, the object's ownerReference
is removed before restoring the object.
Here the restore is done by vSphere-plugin for which there is an intermediate PVC created and that PVC is not created from the PVC object provided by Velero but from the snapshot status saved by the vSphere-plugin itself.
Therefore, this is a vSphere-plugin specific problem.
@Sediket Please create a ticket in the vSphere-plugin github repo, you can link the current issue so that the details could be included.
@Sediket
So the expected behavior from Velero is that the ownerReference
will be removed after the restore, please check if this works in your case and if it doesn't work, please leave a comment in #4707 and explain why ownerReference
is required.
Closing this issue as transferred to vSphere-plugin.
What steps did you take and what happened:
I backed up a helm install of grafana's loki (https://github.com/grafana/loki/tree/main/production/helm/loki) and I noticed the PVCs have an ownerReference set to the loki-backend StatefulSet. So on restore the PVC will not be created referencing the ownerReference, same for all the PVCs in the StatefulSet:
There are errors in the restore referencing the PVC does not exist:
The PVC is terminated, and then is dynamically re-created later when the statefulSet is restored, but with an empty PV and PVC. PVCs that don't have the ownerReference set are restored successfully.
I also verified this by testing the creation of PVCs with and without ownerReferences, they won't be created if the ownerReference is set and the ownerReference resource does not exist.
Events from the namespace during a restore:
Watching PVCs during a restore:
What did you expect to happen: successful restore with all PV contents
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
Can't attach due to security, but I can attach the failed restore log.Anything else you would like to add:
Environment:
velero version
): v1.13.2velero client config get features
): Not Setkubectl version
): 1.27.2Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.