vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.68k stars 1.4k forks source link

Not able to restore the data #5815

Closed hemanthjakkoju closed 1 year ago

hemanthjakkoju commented 1 year ago

What steps did you take and what happened:

Iam trying to restore the data and getting this error: could not restore, CustomResourceDefinition "authorizationpolicies.security.istio.io" already exists. Warning: the in-cluster version is different than the backed-up version. Please need some input to resolve the issue.

Screen Shot 2023-02-01 at 4 28 31 PM
blackpiglet commented 1 year ago

This means when running restore, Velero found the restore to be restored was already existed in the cluster. Velero will not overwrite the existing resources during restoration, so you can safely ignore these warnings.

the in-cluster version is different than the backed-up version this part means Velero finds out the backed-up k8s resource version is different from the same resource in the cluster. Usually, this happens when the backup was created for a while, and the cluster already went through some changes, then the cluster's resources versions are newer than the backup.

hemanthjakkoju commented 1 year ago

Hi @blackpiglet, Thank you for your reply.

I am now facing other problems, I can restore all PVC, service, and pod from the cluster, but I cannot restore MongoDB data from the cluster, while I restore data, it says "completed," but I am not able to see data in MongoDB compass. Please give me some input.

Screen Shot 2023-02-02 at 3 48 22 PM Screen Shot 2023-02-02 at 3 48 39 PM
blackpiglet commented 1 year ago

Could show the Velero deployment's YAML configuration here? And also need the restore's backup detail information. The following is the example command to get the information.

kubectl -n velero get deployment velero -o yaml

kubectl -n velero get backup <backup-name> -o yaml
hemanthjakkoju commented 1 year ago

@blackpiglet , here is the datils

kubectl -n velero get deployment velero -o yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "4"
    meta.helm.sh/release-name: velero
    meta.helm.sh/release-namespace: velero
  creationTimestamp: "2023-02-01T08:38:30Z"
  generation: 4
  labels:
    app.kubernetes.io/instance: velero
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: velero
    component: velero
    helm.sh/chart: velero-2.32.6
  name: velero
  namespace: velero
  resourceVersion: "3214770"
  uid: a9158d8b-00ba-4bf7-88f2-9784208a5244
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: velero
      app.kubernetes.io/name: velero
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "8085"
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: velero
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: velero
        helm.sh/chart: velero-2.32.6
        name: velero
    spec:
      containers:
      - args:
        - server
        command:
        - /velero
        env:
        - name: VELERO_SCRATCH_DIR
          value: /scratch
        - name: VELERO_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: LD_LIBRARY_PATH
          value: /plugins
        - name: AWS_SHARED_CREDENTIALS_FILE
          value: /credentials/cloud
        image: velero/velero:v1.9.4
        imagePullPolicy: IfNotPresent
        name: velero
        ports:
        - containerPort: 8085
          name: http-monitoring
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /plugins
          name: plugins
        - mountPath: /credentials
          name: cloud-credentials
        - mountPath: /scratch
          name: scratch
      dnsPolicy: ClusterFirst
      initContainers:
      - image: velero/velero-plugin-for-aws:v1.1.0
        imagePullPolicy: IfNotPresent
        name: velero-plugin-for-aws
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /target
          name: plugins
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: velero-server
      serviceAccountName: velero-server
      terminationGracePeriodSeconds: 3600
      volumes:
      - name: cloud-credentials
        secret:
          defaultMode: 420
          secretName: velero
      - emptyDir: {}
        name: plugins
      - emptyDir: {}
        name: scratch
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-02-01T08:38:30Z"
    lastUpdateTime: "2023-02-02T11:09:24Z"
    message: ReplicaSet "velero-6667d9b66f" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-02-02T11:15:34Z"
    lastUpdateTime: "2023-02-02T11:15:34Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 4
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
hemanthjakkoju commented 1 year ago
kubectl -n velero get backup <backup-name> -o yaml

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/source-cluster-k8s-gitversion: v1.23.14-eks-ffeb93d
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: 23+
  creationTimestamp: "2023-02-02T11:16:51Z"
  generation: 12
  labels:
    velero.io/storage-location: aws-velero
  name: "9999"
  namespace: velero
  resourceVersion: "3215228"
  uid: b69b335b-b8a7-4175-a87c-7da616df5287
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToRestic: false
  hooks: {}
  includedNamespaces:
  - '*'
  metadata: {}
  storageLocation: aws-velero
  ttl: 720h0m0s
  volumeSnapshotLocations:
  - aws-velero
status:
  completionTimestamp: "2023-02-02T11:17:03Z"
  errors: 5
  expiration: "2023-03-04T11:16:51Z"
  formatVersion: 1.1.0
  phase: PartiallyFailed
  progress:
    itemsBackedUp: 1030
    totalItems: 1030
  startTimestamp: "2023-02-02T11:16:51Z"
  version: 1
  volumeSnapshotsAttempted: 5
  warnings: 1
blackpiglet commented 1 year ago

I think the reason is in the backup. We can see the backup ended in phase: PartiallyFailed, and it has five errors and one warning. Please use velero debug to collect the debug bundle, and upload it here to help debug further.

hemanthjakkoju commented 1 year ago

@blackpiglet , here it is

bundle-2023-02-01-17-31-29.tar.gz

hemanthjakkoju commented 1 year ago

@blackpiglet , when i use this command to take backup only pods velero backup create new44 --selector database=hits-mongodb --storage-location aws-velero

I am able to take backup

Screen Shot 2023-02-02 at 5 22 32 PM

Velero-Native Snapshots:

➜  AWSKEY git:(main) ✗ kubectl -n velero get backup new44 -o yaml        
apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/source-cluster-k8s-gitversion: v1.23.14-eks-ffeb93d
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: 23+
  creationTimestamp: "2023-02-02T11:50:55Z"
  generation: 5
  labels:
    velero.io/storage-location: aws-velero
  name: new44
  namespace: velero
  resourceVersion: "3224774"
  uid: 52d5c0ce-9626-4b67-9c34-645fe6e6e7f8
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToRestic: false
  hooks: {}
  includedNamespaces:
  - '*'
  labelSelector:
    matchLabels:
      database: hits-mongodb
  metadata: {}
  storageLocation: aws-velero
  ttl: 720h0m0s
  volumeSnapshotLocations:
  - aws-velero
status:
  completionTimestamp: "2023-02-02T11:50:58Z"
  expiration: "2023-03-04T11:50:55Z"
  formatVersion: 1.1.0
  phase: Completed
  progress:
    itemsBackedUp: 11
    totalItems: 11
  startTimestamp: "2023-02-02T11:50:55Z"
  version: 1
  warnings: 1
hemanthjakkoju commented 1 year ago

@blackpiglet , This my helm.tf code to install velero

provider "helm" { kubernetes { host = var.eks_host cluster_ca_certificate = base64decode(var.eks_ca_certificate) exec { api_version = "client.authentication.k8s.io/v1beta1" args = ["eks", "get-token", "--cluster-name", var.eks_id] command = "aws" } } }

resource "helm_release" "velero" { repository = "https://vmware-tanzu.github.io/helm-charts" chart = "velero" name = "velero" namespace = "velero" version = "2.32.6" create_namespace = true

set { name = "configuration.provider" value = "aws" } set { name = "configuration.backupStorageLocation.name" value = "aws-velero" }

set { name = "configuration.backupStorageLocation.config.region" value = "us-east-1" } set { name = "configuration.backupStorageLocation.bucket" value = aws_s3_bucket.s3_logging.bucket }

set { name = "configuration.volumeSnapshotLocation.name" value = "aws-velero" } set { name = "configuration.volumeSnapshotLocation.config.region" value = "us-east-1" }

set { name = "initContainers[0].name" value = "velero-plugin-for-aws" }

set { name = "initContainers[0].image" value = "velero/velero-plugin-for-aws:v1.1.0" } set { name = "initContainers[0].volumeMounts[0].mountPath" value = "/target" } set { name = "initContainers[0].volumeMounts[0].name" value = "plugins" }

blackpiglet commented 1 year ago

I found some errors like this in the Velero backup log. Velero AWS plugin failed to take snapshots of volumes due to permission error.

time="2023-02-01T08:39:52Z" level=error msg="Error backing up item" backup=velero/1 error="error taking snapshot of volume: rpc error: code = Unknown desc = UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: URQ15Q0N5gk5Rh83BKa5-p-N-2WZzJb2YmwE3or1cMe88sjhact3z2Mut10Wl6NGsslS37Iw01nsGKRJ91HieQb6A9EKfcUyahaYCTY9gxnnRoGlJG9wJpXCjBcLYqbCtovaTYdIn3yCGhf4aOeIWk_IZUnf_9zGAs7zdoywe7TZOtuqY4YgmUujPPHDCDHX_qa1TtwI8uZokOvT1OWNwvLqhftl7WIJUpADjTKh_GJDsl_SznhZBRMaUNh_X-bJs-HoFafP0VKR2A_g8aX88Pdx8uQFkGhlR15NHp_rMgDan5ZGlFJqGG98jPAdhJ_fHeoy8jDDyewYiHXmcYcQerpZ-WJr6pOm0454jrzH7Pov3_drNoMSw7EwfOFcUU0_qfvxF0oly5QBYfFrRrpJvF1t2U3ikDBTn4eqPU7DEacgSBtucPbawKJsUAcIAD3Q3Wk8sRfyLJKGpEQXdxYbHJG99eabvbYClifGb2AS12pdnrauIM-52au_1zCPIcQBNximCMnTPOXuKk53Hcgw5Vj2NUbs3M3qhNCMOzYOv4zS6PJm--SVrAkayv1PCXELZTx0rrtNVHMY0wh8_sxdPVn_Ve7WtW6qTSrLZoLA\n\tstatus code: 403, request id: 4e4f5884-b23d-4e38-827d-92c44876f62b" logSource="pkg/backup/backup.go:417" name=data-auth-mysql-0

And there are five volumes included in the backup. I think the 5 errors are all related the snapshot creation error.

Velero-Native Snapshots:
  pvc-6b918563-27d5-425e-b696-d1fd09e16a30:
    Snapshot ID:        
    Type:               gp2
    Availability Zone:  us-east-1a
    IOPS:               <N/A>
  pvc-c784f0c4-b8c4-4a13-a854-01e8d8d27079:
    Snapshot ID:        
    Type:               gp2
    Availability Zone:  us-east-1a
    IOPS:               <N/A>
  pvc-c803201b-0670-4fe5-8ead-80e1f709cc67:
    Snapshot ID:        
    Type:               gp2
    Availability Zone:  us-east-1b
    IOPS:               <N/A>
  pvc-3f6a243a-132e-4661-949c-578b46580907:
    Snapshot ID:        
    Type:               gp2
    Availability Zone:  us-east-1a
    IOPS:               <N/A>
  pvc-a4afccef-df24-42fa-925a-dee891414e17:
    Snapshot ID:        
    Type:               gp2
    Availability Zone:  us-east-1a
    IOPS:               <N/A>

Please check your IAM setting.

When you specify the included resource in the backup by label selector, volumes are not involved, so no error is reported.

hemanthjakkoju commented 1 year ago

Hi @blackpiglet , Can velero takes PVC and PV data backup ?

blackpiglet commented 1 year ago

@hemanthjakkoju Sure. By default, Velero uses cloud provider snapshot API to back up the data in the volumes referenced by the PV. That needs the user to have permission to call the AWS snapshot API. Please check the secret used in the Velero installation. The secret is an AWS user AK/SK. Please make sure the user has the correct permission setting. This is a reference: https://github.com/vmware-tanzu/velero-plugin-for-aws#set-permissions-for-velero

hemanthjakkoju commented 1 year ago

@blackpiglet ,

I have followed what you have shared with me. I have given all the permission to access the cluster, I can take backups but not able to restore MongoDB data. Please give me some input.

hemanthjakkoju commented 1 year ago

@blackpiglet ,

If possible, can we setup a meeting so I can show you my issue?

blackpiglet commented 1 year ago

Better to provide the debug bundle to help to debug. Please run velero debug command to collect the information, and upload it here.

hemanthjakkoju commented 1 year ago

hi @blackpiglet , bundle-2023-02-16-19-27-15.tar.gz

As requested i have shared you debug file[](url)

blackpiglet commented 1 year ago

@hemanthjakkoju I checked the information in the bundle. The backups are finished successfully. There are some error logs. IMO, most of them are not that important.

I think only this one may be worth noticing. The PVC data-hits-mysql-0 in default namespace is not found in the restore's related backup metadata. Could you check why this PVC is not included in backup?

level=warning msg="unable to restore additional item" additionalResource=persistentvolumeclaims additionalResourceName=data-hits-mysql-0 additionalResourceNamespace=default error="stat /tmp/923444653/resources/persistentvolumeclaims/namespaces/default/data-hits-mysql-0.json: no such file or directory"
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:65:time="2023-02-14T19:37:53Z" level=error msg="No backup storage locations found, at least one is required" controller=backup-sync error="no backup storage locations found" error.file="/go/src/github.com/vmware-tanzu/velero/internal/storage/storagelocation.go:93" error.function=github.com/vmware-tanzu/velero/internal/storage.ListBackupStorageLocations logSource="pkg/controller/backup_sync_controller.go:136"
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:74:time="2023-02-14T19:37:54Z" level=error msg="Current BackupStorageLocations available/unavailable/unknown: 0/0/1)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:173"
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:4255:time="2023-02-14T19:43:04Z" level=warning msg="unable to restore additional item" additionalResource=persistentvolumeclaims additionalResourceName=data-hits-mysql-0 additionalResourceNamespace=default error="stat /tmp/1653206923/resources/persistentvolumeclaims/namespaces/default/data-hits-mysql-0.json: no such file or directory" logSource="pkg/restore/restore.go:1184" restore=velero/full-20230215011236
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:6469:time="2023-02-14T19:43:39Z" level=info msg="Service default/hits-mongo exists, ignore the provided port is already allocated error" logSource="pkg/restore/restore.go:1486" restore=velero/full-20230215011236
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:7889:time="2023-02-14T19:48:25Z" level=warning msg="unable to restore additional item" additionalResource=persistentvolumeclaims additionalResourceName=data-hits-mysql-0 additionalResourceNamespace=default error="stat /tmp/1498473737/resources/persistentvolumeclaims/namespaces/default/data-hits-mysql-0.json: no such file or directory" logSource="pkg/restore/restore.go:1184" restore=velero/full-20230215011753
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:10115:time="2023-02-14T19:49:00Z" level=info msg="Service default/hits-mongo exists, ignore the provided port is already allocated error" logSource="pkg/restore/restore.go:1486" restore=velero/full-20230215011753
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:10357:time="2023-02-14T19:54:42Z" level=error msg="Error updating download request" controller=download-request downloadRequest=velero/full-20230215011236-9375510b-46df-402f-afe4-9a97a5fe2b57 error="downloadrequests.velero.io \"full-20230215011236-9375510b-46df-402f-afe4-9a97a5fe2b57\" not found" logSource="pkg/controller/download_request_controller.go:74"
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:10377:time="2023-02-14T20:00:03Z" level=error msg="Error updating download request" controller=download-request downloadRequest=velero/full-20230215011753-7a48512c-20d8-4978-8e32-282de408c0e4 error="downloadrequests.velero.io \"full-20230215011753-7a48512c-20d8-4978-8e32-282de408c0e4\" not found" logSource="pkg/controller/download_request_controller.go:74"
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:18736:time="2023-02-15T18:06:24Z" level=warning msg="unable to restore additional item" additionalResource=persistentvolumeclaims additionalResourceName=data-hits-mysql-0 additionalResourceNamespace=default error="stat /tmp/923444653/resources/persistentvolumeclaims/namespaces/default/data-hits-mysql-0.json: no such file or directory" logSource="pkg/restore/restore.go:1184" restore=velero/end-20230215233556
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:20992:time="2023-02-15T18:07:01Z" level=info msg="Service default/hits-mongo exists, ignore the provided port is already allocated error" logSource="pkg/restore/restore.go:1486" restore=velero/end-20230215233556
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:21263:time="2023-02-15T18:18:03Z" level=error msg="Error updating download request" controller=download-request downloadRequest=velero/end-20230215233556-cee5b74e-9360-4e91-9869-0b0e77a3ecc0 error="downloadrequests.velero.io \"end-20230215233556-cee5b74e-9360-4e91-9869-0b0e77a3ecc0\" not found" logSource="pkg/controller/download_request_controller.go:74"
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:23447:time="2023-02-15T21:26:15Z" level=warning msg="unable to restore additional item" additionalResource=persistentvolumeclaims additionalResourceName=data-hits-mysql-0 additionalResourceNamespace=default error="stat /tmp/3528135328/resources/persistentvolumeclaims/namespaces/default/data-hits-mysql-0.json: no such file or directory" logSource="pkg/restore/restore.go:1184" restore=velero/end-20230216025545
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:25334:time="2023-02-15T21:26:49Z" level=info msg="Service default/hits-mongo exists, ignore the provided port is already allocated error" logSource="pkg/restore/restore.go:1486" restore=velero/end-20230216025545
kubecapture/core_v1/velero/velero-f4f67969f-5hslq/velero/velero.log:25609:time="2023-02-15T21:37:49Z" level=error msg="Error updating download request" controller=download-request downloadRequest=velero/end-20230216025545-fbcd3be9-38bf-426e-9343-f1c2f73d7a39 error="downloadrequests.velero.io \"end-20230216025545-fbcd3be9-38bf-426e-9343-f1c2f73d7a39\" not found" logSource="pkg/controller/download_request_controller.go:74"
hemanthjakkoju commented 1 year ago

Hi @blackpiglet, Thanks for replying.

Actually, my issue is with Mongodb pod data backup, Not about data-hits-mysql-0 I.

I am trying to take a backup and restore the MongoDB data, but not able to restore data.

I have stored some data in MongoDB, taken full cluster backup, and deleted pod deployment (command: kubectl delete deployment hits-mongo) and pod PVC ( All the data we store in efs). And when I try to restore it, the deployment and PVC are restored, but all the data in the container has been deleted (it's not restored).Not understanding where I am missing anything. I will place all the screenshots below for you to look over.

Please give me input or if you want we can setup a meeting to discuss my issue.

hemanthjakkoju commented 1 year ago

Pod name( hits-mongo)

Screen Shot 2023-02-17 at 10 37 31 PM

hemanthjakkoju commented 1 year ago

Data Iam trying to backup and restore

Screen Shot 2023-02-17 at 10 40 08 PM

hemanthjakkoju commented 1 year ago

code Iam using to install velero


  kubernetes {
    host                   = var.eks_host
    cluster_ca_certificate = base64decode(var.eks_ca_certificate)
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = ["eks", "get-token", "--cluster-name", var.eks_id]
      command     = "aws"
    }
  }
}

resource "helm_release" "velero" {
  repository       = "https://vmware-tanzu.github.io/helm-charts"
  chart            = "velero"
  name             = "velero"
  namespace        = "velero"
  version          = "2.32.3"
  create_namespace = true

  set {
    name  = "configuration.provider"
    value = "aws"
  }
  set {
    name  = "configuration.backupStorageLocation.name"
    value = "aws-velero"
  }

  set {
    name  = "configuration.backupStorageLocation.config.region"
    value = "us-east-1"
  }
  set {
    name  = "configuration.backupStorageLocation.bucket"
    value = aws_s3_bucket.s3_logging.bucket
  }

  set {
    name  = "configuration.volumeSnapshotLocation.name"
    value = "aws-velero"
  }
  set {
    name  = "configuration.volumeSnapshotLocation.config.region"
    value = "us-east-1"
  }

  set {
    name  = "initContainers[0].name"
    value = "velero-plugin-for-aws"
  }

  set {
    name  = "initContainers[0].image"
    value = "velero/velero-plugin-for-aws:v1.1.0"
  }
  set {
    name  = "initContainers[0].volumeMounts[0].mountPath"
    value = "/target"
  }
  set {
    name  = "initContainers[0].volumeMounts[0].name"
    value = "plugins"
  }

  set {
    name  = "restic.privileged"
    value = "true"
  }

}
blackpiglet commented 1 year ago

@hemanthjakkoju Just notice that the volume's provisioner is EFS. By my understanding, Velero's AWS plugin doesn't support EFS. You can try with Restic uploader: https://velero.io/docs/v1.9/restic/.

hemanthjakkoju commented 1 year ago

Hi @blackpiglet ,

I have installed restic already. But it also does not work.


set {
    name  = "restic.privileged"
    value = "true"
  }
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

blackpiglet commented 1 year ago

@hemanthjakkoju Did you resolve the issue you met with Restic?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

blackpiglet commented 1 year ago

Close for now. Feel free to reopen if you still have unresolve issues.