vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.61k stars 1.39k forks source link

Restore failes for PVCs that have an ownerReference #7862

Closed Sediket closed 3 months ago

Sediket commented 3 months ago

What steps did you take and what happened:

I backed up a helm install of grafana's loki (https://github.com/grafana/loki/tree/main/production/helm/loki) and I noticed the PVCs have an ownerReference set to the loki-backend StatefulSet. So on restore the PVC will not be created referencing the ownerReference, same for all the PVCs in the StatefulSet:

22m         Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-1                         ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"

There are errors in the restore referencing the PVC does not exist:

    restore-monitoring:  error preparing persistentvolumeclaims/restore-monitoring/data-loki-backend-0: rpc error: code = Unknown desc = Failed to create a CloneFromSnapshot CR: Phase=Failed, err=cloneFromSnapshot: Failed at calling SnapshotManager CreateVolumeFromSnapshotWithMetadata with peId pvc:restore-monitoring/data-loki-backend-0, err: persistentvolumeclaims "data-loki-backend-0" not found

The PVC is terminated, and then is dynamically re-created later when the statefulSet is restored, but with an empty PV and PVC. PVCs that don't have the ownerReference set are restored successfully.

I also verified this by testing the creation of PVCs with and without ownerReferences, they won't be created if the ownerReference is set and the ownerReference resource does not exist.

Events from the namespace during a restore:

2m39s       Normal    Provisioning               persistentvolumeclaim/data-loki-backend-1   External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-1"
66s         Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-1   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
66s         Normal    ExternalProvisioning       persistentvolumeclaim/data-loki-backend-1   waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
66s         Normal    Provisioning               persistentvolumeclaim/data-loki-backend-1   External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-1"
4m21s       Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-2   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
4m21s       Normal    ExternalProvisioning       persistentvolumeclaim/data-loki-backend-2   waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
4m21s       Normal    Provisioning               persistentvolumeclaim/data-loki-backend-2   External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-2"
2m41s       Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-2   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
68s         Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-2   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"

Watching PVCs during a restore:

data-loki-backend-2   Pending                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-write-1     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-1     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-1     Pending       pvc-19af38e0-3a52-4bb7-b86f-4aeea262ae0a   0                         spectro-storage-class-bind-immediate   0s
data-loki-write-1     Bound         pvc-19af38e0-3a52-4bb7-b86f-4aeea262ae0a   10Gi       RWO            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-write-2     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-2     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-2     Pending       pvc-dcc2e856-adff-4ddf-9b5b-502d55b0e4d2   0                         spectro-storage-class-bind-immediate   0s
data-loki-write-2     Bound         pvc-dcc2e856-adff-4ddf-9b5b-502d55b0e4d2   10Gi       RWO            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-1   Pending       pvc-91cab97a-729e-4503-9ee8-6e9f2dbb5edd   0                         spectro-storage-class                  2s
data-loki-backend-1   Bound         pvc-91cab97a-729e-4503-9ee8-6e9f2dbb5edd   10Gi       RWO            spectro-storage-class                  2s
data-loki-backend-0   Pending       pvc-cbe33462-06db-4ea9-8dc4-3f0f15ccce36   0                         spectro-storage-class                  2s
data-loki-backend-0   Bound         pvc-cbe33462-06db-4ea9-8dc4-3f0f15ccce36   10Gi       RWO            spectro-storage-class                  2s
data-loki-backend-2   Pending       pvc-58823323-8f82-4fa6-98bd-b9a594a7aebc   0                         spectro-storage-class                  2s
data-loki-backend-2   Bound         pvc-58823323-8f82-4fa6-98bd-b9a594a7aebc   10Gi       RWO            spectro-storage-class                  2s

What did you expect to happen: successful restore with all PV contents

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help Can't attach due to security, but I can attach the failed restore log.

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 3 months ago

dup with #4707 for ownerReference support. And this may also be solved by #7481

Sediket commented 3 months ago

Hello!

I'm new to Velero and just trying to test I can do backups and restores, so forgive me if I'm not familiar with too much of the inner-workings.

I was looking at https://github.com/vmware-tanzu/velero/issues/4707 and I believe it's not the same issue. In 4707 the issue is identifying that the ownerreference was not carried over during the restore. The issue I'm identifying is that the ownerreference is present and because of the ownerreference the restore is not working.

Specifically, during the restore, because of the ownerreference, the PVC is auto-deleted by k8s because the owner is not present, in this case it's the satefuleSet because the PVCs are restored before the statefulSet.

For https://github.com/vmware-tanzu/velero/pull/7481 this looks like a restore of only a specified PV and PVC to a dummy resource, then detaching and re-attaching to the running resource, not sure how that would help.

Thanks!

reasonerjt commented 3 months ago

@Sediket I'm a little confused. Velero removes the ownerReferences of an object before it's created: https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L1299

The error is about restore-monitoring/data-loki-backend-0 but the warning you mentioned in the description is about persistentvolumeclaim/data-loki-backend-1

Could you double check why restore-monitoring/data-loki-backend-0 is not available?

It would be helpful if you could reproduce the problem and collect the debug bundle via velero debug

Sediket commented 3 months ago

@reasonerjt Hello! These are the errors I'm getting in the namespace, during the restore:

2m39s       Normal    Provisioning               persistentvolumeclaim/data-loki-backend-1   External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-1"
66s         Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-1   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
66s         Normal    ExternalProvisioning       persistentvolumeclaim/data-loki-backend-1   waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
66s         Normal    Provisioning               persistentvolumeclaim/data-loki-backend-1   External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-1"
4m21s       Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-2   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
4m21s       Normal    ExternalProvisioning       persistentvolumeclaim/data-loki-backend-2   waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
4m21s       Normal    Provisioning               persistentvolumeclaim/data-loki-backend-2   External provisioner is provisioning volume for claim "restore-monitoring/data-loki-backend-2"
2m41s       Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-2   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"
68s         Warning   OwnerRefInvalidNamespace   persistentvolumeclaim/data-loki-backend-2   ownerRef [apps/v1/StatefulSet, namespace: restore-monitoring, name: loki-backend, uid: 5f892cfc-93b3-46e7-9ea1-e7c2532be0a9] does not exist in namespace "restore-monitoring"

I'm also doing the configmap to map them to a new storage class: (https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/storageclass-mapping.md) which is working you see the PVCs trying to be created, but they are removed given the owner is not present. Later the statefulSet is created and the PVC doesn't because it failed to create and a new one is created on the default storage class and is empty.

And all my PVCs that don't have an ownerReference are restored correctly.

Sediket commented 3 months ago

Is it because when using the change-storage-class-config it doesn't filter out the metadata?:

Sediket commented 3 months ago

some more data, watching the PVCs durring a restore, the data-loki-backend PVCs are the ones with the ownerReference and the data-loki-write PVCs don't have an ownerReference.

The data-loki-backend keeps geting terminated as the owner isn't present and the others are being created just fine. At the end the PVCs for data-loki-backend are created dynamically with the default storage class as the statefulSet is being deployed and are empty:

data-loki-backend-2   Pending                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-write-1     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-1     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-1     Pending       pvc-19af38e0-3a52-4bb7-b86f-4aeea262ae0a   0                         spectro-storage-class-bind-immediate   0s
data-loki-write-1     Bound         pvc-19af38e0-3a52-4bb7-b86f-4aeea262ae0a   10Gi       RWO            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-write-2     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-2     Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-write-2     Pending       pvc-dcc2e856-adff-4ddf-9b5b-502d55b0e4d2   0                         spectro-storage-class-bind-immediate   0s
data-loki-write-2     Bound         pvc-dcc2e856-adff-4ddf-9b5b-502d55b0e4d2   10Gi       RWO            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-2   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Pending                                                                            spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-1   Terminating                                                                        spectro-storage-class-bind-immediate   0s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  0s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-0   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-1   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-2   Pending                                                                            spectro-storage-class                  1s
data-loki-backend-1   Pending       pvc-91cab97a-729e-4503-9ee8-6e9f2dbb5edd   0                         spectro-storage-class                  2s
data-loki-backend-1   Bound         pvc-91cab97a-729e-4503-9ee8-6e9f2dbb5edd   10Gi       RWO            spectro-storage-class                  2s
data-loki-backend-0   Pending       pvc-cbe33462-06db-4ea9-8dc4-3f0f15ccce36   0                         spectro-storage-class                  2s
data-loki-backend-0   Bound         pvc-cbe33462-06db-4ea9-8dc4-3f0f15ccce36   10Gi       RWO            spectro-storage-class                  2s
data-loki-backend-2   Pending       pvc-58823323-8f82-4fa6-98bd-b9a594a7aebc   0                         spectro-storage-class                  2s
data-loki-backend-2   Bound         pvc-58823323-8f82-4fa6-98bd-b9a594a7aebc   10Gi       RWO            spectro-storage-class                  2s
Lyndon-Li commented 3 months ago

Looks like data-loki-backend has been deleted after Velero restore creates it. And the deletion of the PVC is due to the referred object of the ownerReference doesn't exist as the statefulset has not restored yet.

As the behavior of Velero, the object's ownerReference is removed before restoring the object. Here the restore is done by vSphere-plugin for which there is an intermediate PVC created and that PVC is not created from the PVC object provided by Velero but from the snapshot status saved by the vSphere-plugin itself.

Therefore, this is a vSphere-plugin specific problem.

Lyndon-Li commented 3 months ago

@Sediket Please create a ticket in the vSphere-plugin github repo, you can link the current issue so that the details could be included.

Lyndon-Li commented 3 months ago

@Sediket So the expected behavior from Velero is that the ownerReference will be removed after the restore, please check if this works in your case and if it doesn't work, please leave a comment in #4707 and explain why ownerReference is required.

Lyndon-Li commented 3 months ago

Closing this issue as transferred to vSphere-plugin.