Closed reddog335 closed 4 years ago
Can you provide the output of kubectl -n velero get podvolumebackups -l velero.io/backup-name=aks-daily-backup-itg-20191028152145 -o yaml
?
FYI, the restic integration relies on dynamic provisioning to restore volumes -- so during a restore, it's expected behavior to get a new, dynamically-provisioned PV that should get the backed-up data restore into.
Thanks for looking at this @skriss !
apiVersion: v1 items:
This issue is still tagged as 'Waiting for info'... do you need any further information from me?
Is there anything I need to do on my end to progress this issue (apologies, this is my first issue submitted and not sure of the protocol)
On Mon, Oct 28, 2019 at 5:17 PM Steve Kriss notifications@github.com wrote:
Can you provide the output of kubectl -n velero get podvolumebackups -l velero.io/backup-name=aks-daily-backup-itg-20191028152145 -o yaml?
FYI, the restic integration relies on dynamic provisioning to restore volumes -- so during a restore, it's expected behavior to get a new, dynamically-provisioned PV that should get the backed-up data restore into.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vmware-tanzu/velero/issues/2007?email_source=notifications&email_token=AI2V6SCOD2J4LUTLV3C4QNDQQ5QJPA5CNFSM4JF45SU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOSKTI#issuecomment-547169613, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V6SFVKWGBUUE6WMG44ZLQQ5QJPANCNFSM4JF45SUQ .
@reddog335 apologies for the delayed response, we're working on finalizing v1.2 at the moment :)
The YAML you sent me indicates that velero/restic didn't find any files in any of the volumes (message: volume was empty so no snapshot was taken
). Can you confirm that there was in fact data in each of those at the time of backup?
@skriss No worries, I appreciate the help. There was definitely data in the volumes. It's a text file where all three replicas write to simultaneously. After the backup I removed the text file from the Azure File to test the restore. I can run another test if you'd like.
Hmm, not sure why restic wouldn't be finding the file.
The way the restic backups work is that the velero/restic daemonset uses a hostPath mount of /var/lib/kubelet/pods
, which is the directory on each node in the cluster where pod volumes are mounted. If you look in the YAML above, you'll see a backup path of e.g. /host_pods/7ba88096-f996-11e9-ae16-260815deaa94/volumes/kubernetes.io~azure-file/pvc-e495741c-f995-11e9-ae16-260815deaa94
(/host_pods
is the location in the daemonset pod where the /var/lib/kubelet/pods
directory is mounted).
It'd be really helpful if you could run another test, and prior to backup, do the following:
ls -la
on the volume directory: /host_pods/<your-workload-pod-uid>/volumes/kubernetes.io~azure-file/<pv-name>
ls -la
from there on /var/lib/kubelet/pods/<your-workload-pod-uid>/volumes/kubernetes.io~azure-file/<pv-name>
@skriss I received the same errors with this test and the newly created Azure File Share was empty after the restore. Below are the details:
kubectl exec -it restic-mj29r -n velero – bash
root@restic-mj29r:/# ls -rlt /host_pods/edb00905-fe4c-11e9-9d9f-2ef301709b43/volumes/kubernetes.io~azure-file/pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43 total 9 -rwxr-xr-x 1 1000 1000 9080 Nov 3 15:11 gpfs_test.txt
root@restic-mj29r:/# tail -10 /host_pods/edb00905-fe4c-11e9-9d9f-2ef301709b43/volumes/kubernetes.io~azure-file/pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43/gpfs_test.txt azure-files-test-itg-6bfb6cbd5d-n8kt7:76 azure-files-test-itg-6bfb6cbd5d-jl566:75 azure-files-test-itg-6bfb6cbd5d-vgnpr:74 azure-files-test-itg-6bfb6cbd5d-n8kt7:77 azure-files-test-itg-6bfb6cbd5d-jl566:76
root@aks-nodepool1-26580627-vmss000002:~# tail -5 /var/lib/kubelet/pods/edb00905-fe4c-11e9-9d9f-2ef301709b43/volumes/kubernetes.io~azure-file/pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43/gpfs_test.txt azure-files-test-itg-6bfb6cbd5d-n8kt7:76 azure-files-test-itg-6bfb6cbd5d-jl566:75 azure-files-test-itg-6bfb6cbd5d-vgnpr:74 azure-files-test-itg-6bfb6cbd5d-n8kt7:77 azure-files-test-itg-6bfb6cbd5d-jl566:76
velero backup create manual-backup --exclude-namespaces velero,default --snapshot-volumes=true
velero get backup
NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
aks-daily-backup-itg-20191103150531 Completed 2019-11-03 09:05:31 -0600 CST 29d default
kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE vcs-itg-azure-file Bound pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43 5Gi RWX azure-file-std-grs 30m
kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43 5Gi RWX Retain Bound vcs-itg/vcs-itg-azure-file azure-file-std-grs 30m
kubectl delete ns vcs-itg namespace "vcs-itg" deleted
kubectl get pvc No resources found.
kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43 5Gi RWX Retain Released vcs-itg/vcs-itg-azure-file azure-file-std-grs 32m
velero restore create --from-backup manual-backup
Restore request "manual-backup-20191103094041" submitted successfully.
Run velero restore describe manual-backup-20191103094041
or velero restore logs manual-backup-20191103094041
for more details.
https://gist.github.com/reddog335/f41e2c7eaff51f236041a2637a38cea2
kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE vcs-itg-azure-file Bound pvc-4c582f0e-fe50-11e9-9d9f-2ef301709b43 5Gi RWX azure-file-std-grs 81s
kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-4c582f0e-fe50-11e9-9d9f-2ef301709b43 5Gi RWX Retain Bound vcs-itg/vcs-itg-azure-file azure-file-std-grs 84s pvc-a092dd45-fe4b-11e9-9d9f-2ef301709b43 5Gi RWX Retain Released vcs-itg/vcs-itg-azure-file azure-file-std-grs 34m
kubectl get pod NAME READY STATUS RESTARTS AGE aks-helloworld-context-path-itg-59b59b89b4-l5rsx 1/1 Running 0 87s aks-helloworld-context-path-itg-59b59b89b4-s7k7m 1/1 Running 0 87s aks-helloworld-context-path-itg-59b59b89b4-sdl5m 1/1 Running 0 86s aks-helloworld-itg-79fc55b998-67tss 1/1 Running 0 86s aks-helloworld-itg-79fc55b998-gn2vn 1/1 Running 0 86s aks-helloworld-itg-79fc55b998-qvn8s 1/1 Running 0 86s azure-files-test-itg-65bbd9f965-55gt5 0/1 Init:0/1 0 85s azure-files-test-itg-65bbd9f965-cdjdk 0/1 Init:0/1 0 85s azure-files-test-itg-65bbd9f965-s8l48 0/1 Init:0/1 0 85s
kubectl describe pod azure-files-test-itg-65bbd9f965-55gt5
Name: azure-files-test-itg-65bbd9f965-55gt5
Namespace: vcs-itg
Priority: 0
PriorityClassName:
Normal Scheduled 2m20s default-scheduler Successfully assigned vcs-itg/azure-files-test-itg-65bbd9f965-55gt5 to aks-nodepool1-26580627-vmss000002 Normal Pulled 2m18s kubelet, aks-nodepool1-26580627-vmss000002 Container image "gcr.io/heptio-images/velero-restic-restore-helper:v1.1.0" already present on machine Normal Created 2m18s kubelet, aks-nodepool1-26580627-vmss000002 Created container Normal Started 2m18s kubelet, aks-nodepool1-26580627-vmss000002 Started container
kubectl delete pod azure-files-test-itg-65bbd9f965-55gt5 pod "azure-files-test-itg-65bbd9f965-55gt5" deleted
kubectl get pod NAME READY STATUS RESTARTS AGE aks-helloworld-context-path-itg-59b59b89b4-l5rsx 1/1 Running 0 4m26s aks-helloworld-context-path-itg-59b59b89b4-s7k7m 1/1 Running 0 4m26s aks-helloworld-context-path-itg-59b59b89b4-sdl5m 1/1 Running 0 4m25s aks-helloworld-itg-79fc55b998-67tss 1/1 Running 0 4m25s aks-helloworld-itg-79fc55b998-gn2vn 1/1 Running 0 4m25s aks-helloworld-itg-79fc55b998-qvn8s 1/1 Running 0 4m25s azure-files-test-itg-65bbd9f965-9cjnc 1/1 Running 0 21s azure-files-test-itg-65bbd9f965-cdjdk 0/1 Init:0/1 0 4m24s azure-files-test-itg-65bbd9f965-s8l48 0/1 Init:0/1 0 4m24s
kubectl exec -it azure-files-test-itg-65bbd9f965-9cjnc -- ls -lrt /data
total 0
kubectl exec -it restic-mj29r -n velero -- bash root@restic-mj29r:/# ls -rlt /host_pods total 28 drwxr-x--- 5 root root 4096 Oct 17 12:32 148bc1bf-f0da-11e9-ae16-260815deaa94 drwxr-x--- 5 root root 4096 Oct 25 11:25 156deeb7-f71a-11e9-ae16-260815deaa94 drwxr-x--- 5 root root 4096 Oct 28 15:14 9b39efad-f995-11e9-ae16-260815deaa94 drwxr-xr-x 5 root root 4096 Nov 3 15:04 4ae9d4bb-fe4b-11e9-9d9f-2ef301709b43 drwxr-x--- 5 root root 4096 Nov 3 15:40 532f5f7d-fe50-11e9-9d9f-2ef301709b43 drwxr-x--- 5 root root 4096 Nov 3 15:40 538acc95-fe50-11e9-9d9f-2ef301709b43 drwxr-x--- 5 root root 4096 Nov 3 15:45 e52d0ba8-fe50-11e9-9d9f-2ef301709b43
root@restic-mj29r:/# ls -rlt /host_pods/532f5f7d-fe50-11e9-9d9f-2ef301709b43/volumes total 4 drwxr-xr-x 3 root root 4096 Nov 3 15:40 kubernetes.io~secret
root@restic-mj29r:/# ls -rlt /host_pods/538acc95-fe50-11e9-9d9f-2ef301709b43/volumes total 4 drwxr-xr-x 3 root root 4096 Nov 3 15:40 kubernetes.io~secret
root@restic-mj29r:/# ls -rlt /host_pods/e52d0ba8-fe50-11e9-9d9f-2ef301709b43/volumes total 8 drwxr-xr-x 3 root root 4096 Nov 3 15:45 kubernetes.io~secret drwx------ 3 root root 4096 Nov 3 15:45 kubernetes.io~azure-file
root@restic-mj29r:/# ls -rlt /host_pods/e52d0ba8-fe50-11e9-9d9f-2ef301709b43/volumes/kubernetes.io~azure-file total 0 drwxr-xr-x 2 1000 1000 0 Nov 3 15:40 pvc-4c582f0e-fe50-11e9-9d9f-2ef301709b43
root@restic-mj29r:/# ls -rlt /host_pods/e52d0ba8-fe50-11e9-9d9f-2ef301709b43/volumes/kubernetes.io~azure-file/pvc-4c582f0e-fe50-11e9-9d9f-2ef301709b43 total 0
Can you additionally provide kubectl -n velero get podvolumebackups -l velero.io/backup-name=manual-backup -o yaml
?
It looks like the backup(s) are coming back empty again - really not sure why that's the case, given the files are clearly visible via the restic pod.
@skriss Below is the output of the command:
kubectl -n velero get podvolumebackups -l velero.io/backup-name=manual-backup -o yaml
apiVersion: v1 items:
@reddog335 I came across this old issue: https://github.com/vmware-tanzu/velero/issues/887, which I think may be coming into play here. Could you first show the YAML for your Azure file storage class, then make the change specified here, then try again?
You can take a look at the documentation I put up here: https://github.com/vmware-tanzu/velero/pull/2054/files for details.
BOOM!!! That worked like a champ. @skriss , you sir, are a steely-eyed missile man! Thank you so much for the help, I truly appreciate it!!!
Hi @skriss, I meet the same error, but our storage is not AWS, it is a NFS server which based on a ppc64le. completionTimestamp: '2022-02-24T09:52:19Z' message: volume was empty so no snapshot was taken path: >- /host_pods/1950f55b-1e43-48c8-b049-f8292ed4cb2d/volumes/kubernetes.io~nfs/pvc-e0075e9e-22ed-43dd-bb48-11ad24252f8f phase: Completed progress: {} startTimestamp: '2022-02-24T09:52:18Z' I checked the restic daemonset pod , it have this volume and not empty sh-4.4# ls -lar total 4 drwxrwxrwx. 3 nobody nobody 4096 Feb 24 08:42 pvc-e0075e9e-22ed-43dd-bb48-11ad24252f8f drwxr-x---. 6 root root 121 Feb 24 09:26 .. drwxr-x---. 3 root root 54 Feb 24 09:26 .
what can i do next? thanks a lot.
What steps did you take and what happened:
We are using Azure Files in AKS for our persistent storage solution. I have a deployment using Azure Files for the perisistent volume claim and persistent volume. I have installed velero with restic. The backups appear to backup the pvc and pv just fine; however, when I perform the restore it creates a new pv with a different name and a second Azure File share in the storage account with no files in it.
Before I removed the ops-itg namespace container the pv and pvc:
kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-e495741c-f995-11e9-ae16-260815deaa94 5Gi RWX Retain Bound ops-itg/ops-itg-azure-file azure-file-std-grs 8m14s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/ops-itg-azure-file Bound pvc-e495741c-f995-11e9-ae16-260815deaa94 5Gi RWX azure-file-std-grs 8m14s
Restore Commands: velero get backups NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR aks-daily-backup-itg-20191028152145 Completed 2019-10-28 10:21:45 -0500 CDT 29d default
velero restore create --from-backup aks-daily-backup-itg-20191028152145
After the restore completed: kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-27106be3-f999-11e9-ae16-260815deaa94 5Gi RWX Retain Bound ops-itg/ops-itg-azure-file azure-file-std-grs 15m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/ops-itg-azure-file Bound pvc-27106be3-f999-11e9-ae16-260815deaa94 5Gi RWX azure-file-std-grs 15m
What did you expect to happen:
I expected the pv to be recreated with the same name and the file in the Azure File share to be mounted to the pods in the deployment.
The output of the following commands will help us better understand what's going on: https://gist.github.com/reddog335/6865b38fbdfa4e6caa71cc7f83b8d10b
Environment:
Velero version (use
velero version
): Client: Version: v1.1.0 Git commit: a357f21aec6b39a8244dd23e469cc4519f1fe608 Server: Version: v1.1.0Velero features (use
velero client config get features
): velero client config get features features:Kubernetes version (use
kubectl version
): Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.10", GitCommit:"37d169313237cb4ceb2cc4bef300f2ae3053c1a2", GitTreeState:"clean", BuildDate:"2019-08-19T10:44:49Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}Kubernetes installer & version: Azure ARM template & Kubernetes version: 1.13.10
Cloud provider or hardware configuration: Azure AKS
OS (e.g. from
/etc/os-release
): Ubuntu 16.04.6 LTS 4.15.0-1061-azure