vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.55k stars 1.38k forks source link

backup pv error #5567

Closed 92ppl closed 1 year ago

92ppl commented 1 year ago

What steps did you take and what happened: [A clear and concise description of what the bug is, and what commands you ran.) use --default-volumes-to-restic backup pv failed.....

time="2022-11-08T08:53:16Z" level=info msg="1 errors encountered backup up item" backup=velero/zqb-backup logSource="pkg/backup/backup.go:413" name=mysql-0
time="2022-11-08T08:53:16Z" level=error msg="Error backing up item" backup=velero/zqb-backup error="pod volume backup failed: building Restic command: getting volume directory name: no matches for kind \"CSIDriver\" in version \"storage.k8s.io/v1\"" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=mysql-0
time="2022-11-08T08:53:16Z" level=info msg="Backed up 3 items out of an estimated total of 49 (estimate 

What did you expect to happen:

The following information will help us better understand what's going on:

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

Phase: InProgress

Errors: 0 Warnings: 0

Namespaces: Included: zhongqingbao Excluded:

Resources: Included: * Excluded: Cluster-scoped: auto

Label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2022-11-08 16:53:10 +0800 CST Completed: <n/a>

Expiration: 2022-12-08 16:53:10 +0800 CST

Estimated total items to be backed up: 49 Items backed up so far: 4

Velero-Native Snapshots:

Restic Backups (specify --details for more information): Failed: 1 New: 1

- `velero backup logs <backupname>`

➜ ~ velero backup logs zqb-backup Logs for backup "zqb-backup" are not available until it's finished processing. Please wait until the backup has a phase of Completed or Failed and try again.


**Anything else you would like to add:**
[Miscellaneous information that will assist in solving the issue.]

**Environment:**

- Velero version (use `velero version`): 

➜ ~ Velero version Client: Version: v1.6.2 Git commit: - Server: Version: v1.9.0

- Velero features (use `velero client config get features`): 

➜ ~ velero client config get features features:

- Kubernetes version (use `kubectl version`):

➜ ~ kubectl version Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.4-12.8d683d9", GitCommit:"8d683d982b20a8f28a62ad502db0f352e50f621c", GitTreeState:"clean", BuildDate:"2019-12-30T09:24:27Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

- Kubernetes installer & version:
- Cloud provider or hardware configuration:

minio


- OS (e.g. from `/etc/os-release`):

**Vote on this issue!**

This is an invitation to the Velero community to vote on issues, you can see the project's [top voted issues listed here](https://github.com/vmware-tanzu/velero/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc).  
Use the "reaction smiley face" up to the right of this comment to vote.

- :+1: for "I would like to see this bug fixed as soon as possible"
- :-1: for "There are more important bugs to focus on right now"
blackpiglet commented 1 year ago

@92ppl Could you post the content of PV correponding to the failed PVC mysql-0? For example, kubectl get pv <pv-name> -o yaml Need to check whether it's a CSI volume. If it's a CSI volume, please update the Velero and Restic DaemonSet version to v1.9.1. There is a bug fix related to Restic backing up CSI volume. https://github.com/vmware-tanzu/velero/pull/5186

92ppl commented 1 year ago

I try latest version, backup redis success, but etcd is PartiallyFailed. it,s use the same nfs.

blackpiglet commented 1 year ago

@92ppl IMO, backup ETCD by copying files from filesystem is not a best practice. Suggest to use ETCD snapshot command instead. https://etcd.io/docs/v3.3/op-guide/recovery/ https://docs.vmware.com/en/VMware-Application-Catalog/services/tutorials/GUID-backup-restore-data-etcd-kubernetes-index.html

92ppl commented 1 year ago

@blackpiglet I test some case. find the failed reason is not etcd , but nfs。

the failed message is 'building Restic command: getting volume directory name: no matches for kind "CSIDriver" in version "storage.k8s.io/v1"'

I create 3 namespace and 3 deployment redis,use nfs-v4 ,nfs-v3 and jd-ssd as storageclass.

  1. use jd-ssd
    
    ➜  examples git:(release-1.9) ✗ velero backup create redis-jd-ssd-backup-data --include-namespaces redis-jd-ssd --default-volumes-to-restic
    Backup request "redis-jd-ssd-backup-data" submitted successfully.
    Run `velero backup describe redis-jd-ssd-backup-data` or `velero backup logs redis-jd-ssd-backup-data` for more details.

➜ examples git:(release-1.9) ✗ velero backup describe redis-jd-ssd-backup-data Name: redis-jd-ssd-backup-data Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/source-cluster-k8s-gitversion=v1.16.4-12.8d683d9 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=16+

Phase: Completed

Errors: 0 Warnings: 0

Namespaces: Included: redis-jd-ssd Excluded:

Resources: Included: * Excluded: Cluster-scoped: auto

Label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2022-11-14 17:24:13 +0800 CST Completed: 2022-11-14 17:24:22 +0800 CST

Expiration: 2022-12-14 17:24:13 +0800 CST

Total items to be backed up: 31 Items backed up: 31

Velero-Native Snapshots:

Restic Backups (specify --details for more information): Completed: 3

2. use nfs-v4/v3

➜ examples git:(release-1.9) ✗ velero backup create redis-nfs-v4-backup-data --include-namespaces redis-nfs-v4 --default-volumes-to-restic Backup request "redis-nfs-v4-backup-data" submitted successfully. Run velero backup describe redis-nfs-v4-backup-data or velero backup logs redis-nfs-v4-backup-data for more details.

➜ examples git:(release-1.9) ✗ velero backup describe redis-nfs-v4-backup-data Name: redis-nfs-v4-backup-data Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/source-cluster-k8s-gitversion=v1.16.4-12.8d683d9 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=16+

Phase: PartiallyFailed (run velero backup logs redis-nfs-v4-backup-data for more information)

Errors: 1 Warnings: 0

Namespaces: Included: redis-nfs-v4 Excluded:

Resources: Included: * Excluded: Cluster-scoped: auto

Label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2022-11-14 17:26:49 +0800 CST Completed: 2022-11-14 17:26:56 +0800 CST

Expiration: 2022-12-14 17:26:49 +0800 CST

Total items to be backed up: 19 Items backed up: 19

Velero-Native Snapshots:

Restic Backups (specify --details for more information): Completed: 2 Failed: 1

➜ examples git:(release-1.9) ✗ velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR redis-jd-ssd-backup-data Completed 0 0 2022-11-14 17:24:13 +0800 CST 29d default redis-nfs-v3-backup-data PartiallyFailed 1 0 2022-11-14 17:25:52 +0800 CST 29d default redis-nfs-v4-backup-data PartiallyFailed 1 0 2022-11-14 17:26:49 +0800 CST 29d default

➜ examples git:(release-1.9) ✗ kubectl get podvolumebackup -n velero -o wide NAME AGE redis-jd-ssd-backup-data-jbr79 5m22s redis-jd-ssd-backup-data-m5dsx 5m22s redis-jd-ssd-backup-data-w2442 5m22s redis-nfs-v3-backup-data-f7hbk 3m43s redis-nfs-v3-backup-data-vxknf 3m43s redis-nfs-v3-backup-data-vzhpv 3m43s redis-nfs-v4-backup-data-j5c95 2m45s redis-nfs-v4-backup-data-k9hf7 2m45s redis-nfs-v4-backup-data-sr7jx 2m45s

➜ examples git:(release-1.9) ✗ kubectl describe podvolumebackup redis-nfs-v4-backup-data-k9hf7 -n velero Name: redis-nfs-v4-backup-data-k9hf7 Namespace: velero Labels: velero.io/backup-name=redis-nfs-v4-backup-data velero.io/backup-uid=c1c2cd93-d037-4bd5-a213-2697e5446718 velero.io/pvc-uid=421b708e-ddb0-45ba-ad41-a800ca75e513 Annotations: velero.io/pvc-name: redis-data-redis-master-0 API Version: velero.io/v1 Kind: PodVolumeBackup Metadata: Creation Timestamp: 2022-11-14T09:26:54Z Generate Name: redis-nfs-v4-backup-data- Generation: 3 Owner References: API Version: velero.io/v1 Controller: true Kind: Backup Name: redis-nfs-v4-backup-data UID: c1c2cd93-d037-4bd5-a213-2697e5446718 Resource Version: 776389374 Self Link: /apis/velero.io/v1/namespaces/velero/podvolumebackups/redis-nfs-v4-backup-data-k9hf7 UID: 1412e5c0-fc2d-45c4-889a-19c8cfbcbabf Spec: Backup Storage Location: default Node: k8s-node-vm2i0v-9pc0c2imx2 Pod: Kind: Pod Name: redis-master-0 Namespace: redis-nfs-v4 UID: daa8edfe-5b84-45df-b472-277c960a13db Repo Identifier: s3:http://172.20.96.3:32487/velero-cluster-crawl/crawl/restic/redis-nfs-v4 Tags: Backup: redis-nfs-v4-backup-data Backup - UID: c1c2cd93-d037-4bd5-a213-2697e5446718 Ns: redis-nfs-v4 Pod: redis-master-0 Pod - UID: daa8edfe-5b84-45df-b472-277c960a13db Pvc - UID: 421b708e-ddb0-45ba-ad41-a800ca75e513 Volume: redis-data Volume: redis-data Status: Completion Timestamp: 2022-11-14T09:26:54Z Message: building Restic command: getting volume directory name: no matches for kind "CSIDriver" in version "storage.k8s.io/v1" Phase: Failed Progress: Start Timestamp: 2022-11-14T09:26:54Z Events:

92ppl commented 1 year ago
➜  examples git:(release-1.9) ✗ kubectl get persistentvolumeclaim -n redis-nfs-v4
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
redis-data-redis-master-0   Bound    pvc-421b708e-ddb0-45ba-ad41-a800ca75e513   8Gi        RWO            nfs-v4         117m
➜  examples git:(release-1.9) ✗ kubectl get pv pvc-421b708e-ddb0-45ba-ad41-a800ca75e513 -n redis-nfs-v4
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                    STORAGECLASS   REASON   AGE
pvc-421b708e-ddb0-45ba-ad41-a800ca75e513   8Gi        RWO            Delete           Bound    redis-nfs-v4/redis-data-redis-master-0   nfs-v4                  118m
➜  examples git:(release-1.9) ✗ kubectl get pv pvc-421b708e-ddb0-45ba-ad41-a800ca75e513 -n redis-nfs-v4 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: nfs-nfs-v4
  creationTimestamp: "2022-11-14T07:59:03Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-421b708e-ddb0-45ba-ad41-a800ca75e513
  resourceVersion: "776323233"
  selfLink: /api/v1/persistentvolumes/pvc-421b708e-ddb0-45ba-ad41-a800ca75e513
  uid: eb135c46-d7bf-4205-a46d-d786c3838640
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 8Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: redis-data-redis-master-0
    namespace: redis-nfs-v4
    resourceVersion: "776323221"
    uid: 421b708e-ddb0-45ba-ad41-a800ca75e513
  mountOptions:
  - vers=4
  - noresvport
  nfs:
    path: /cfs/redis-nfs-v4-redis-data-redis-master-0-pvc-421b708e-ddb0-45ba-ad41-a800ca75e513
    server: 172.20.96.12
  persistentVolumeReclaimPolicy: Delete
  storageClassName: nfs-v4
  volumeMode: Filesystem
status:
  phase: Bound
blackpiglet commented 1 year ago

@92ppl Could you run velero debug command to collect the debug bundle for deeper investigation?

92ppl commented 1 year ago

there is no nfs csidriver exist . I use nfs by nfs-subdir-external-provisioner, not csi-driver-nfs. velero only support csi volume?

➜  examples git:(release-1.9) ✗ kubectl get csidriver
NAME                  CREATED AT
zbs.csi.jdcloud.com   2021-02-26T09:48:07Z
blackpiglet commented 1 year ago

This is due to the Kubernetes version used. We updated the Velero and k8s version matrix lately. Velero v1.9 requires k8s to be no older than v1.18. https://github.com/vmware-tanzu/velero#velero-compatibility-matrix