vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.77k stars 1.41k forks source link

Velero partially failed backup #5795

Closed Lakshmi-r21 closed 1 year ago

Lakshmi-r21 commented 1 year ago

Hi team

Velero back up is partially failing. have attached the logs for the same.Log doesnt have any specific issue Could you please help us on the same

Lakshmi-r21 commented 1 year ago

below is the configuration

 backupStorageLocation:
    # name is the name of the backup storage location where backups should be stored. If a name is not provided,
    # a backup storage location will be created with the name "default". Optional.
    name: default
    # bucket is the name of the bucket to store backups in. Required.
    bucket: bucketname
    # prefix is the directory under which all Velero data should be stored within the bucket. Optional.
    prefix: prefix
    # Additional provider-specific configuration. See link above
    # for details of required/optional fields for your provider.
    config:
      region: eu-central-1
    
    # Parameters for the `default` VolumeSnapshotLocation. See
    # https://velero.io/docs/v1.6/api-types/volumesnapshotlocation/
    volumeSnapshotLocation:
      # name is the name of the volume snapshot location where snapshots are being taken. Required.
      name: default
      # Additional provider-specific configuration. See link above
      # for details of required/optional fields for your provider.
      config:
        region: 
ywk253100 commented 1 year ago

Cannot find any errors in the log file, please provide more context and the complete log

Lakshmi-r21 commented 1 year ago
  1. When we try to take back up, its coming as partially failed..

  2. When we execute command velero backup logs velero-backup-20230127070038 -n kube-backup |grep -Ev info Below is the logs we get time="2023-01-27T07:01:04Z" level=error msg="Error getting volume snapshotter for volume snapshot location" 7T07:01:07Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-aultt" In above comment have pasted my configuration.

  3. Logs of the below command kubectl get VolumeSnapshotLocation -n kube-backup default -o yaml apiVersion: velero.io/v1 kind: VolumeSnapshotLocation metadata: annotations: helm.sh/hook: post-install,post-upgrade,post-rollback helm.sh/hook-delete-policy: before-hook-creation creationTimestamp: "2023-01-23T17:34:38Z" generation: 1

  4. With getting error for volumesnapshot location, went through few links and blogs found out we need to mention region for volume snapshot location. As mentioned in above configuration, have already mentioned the region even for volume. Even then facing issue

draghuram commented 1 year ago

Hi, Just to be clear, are you saying that you provided region but are still seeing the issue? I don't see region in the VSL "default" posted above. Can you check? Also, please make sure you post the spec of a resource as "code" so that indentation is preserved.

Lakshmi-r21 commented 1 year ago
image

As we see above picture, this where we are facing issue.. its showing backup failed partially

Below is my configuration, here for volume have given region as eu-central-1

##
## Configuration settings that directly affect the Velero deployment YAML.
##

# Resource requests/limits to specify for the Velero deployment.
# https://velero.io/docs/v1.6/customize-installation/#customize-resource-requests-and-limits
resources:
  requests:
    cpu: 250m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

# Init containers to add to the Velero deployment's pod spec. At least one plugin provider image is required.
# If the value is a string then it is evaluated as a template.
initContainers:
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:v1.5.2
    volumeMounts:
      - mountPath: /target
        name: plugins

# Settings for Velero's prometheus metrics. Enabled by default.
metrics:
  enabled: true

  serviceMonitor:
    enabled: true

  prometheusRule:
    enabled: true
    # Rules to be deployed
    spec:
    - alert: VeleroBackupPartialFailures
      annotations:
        message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
      expr: |-
        velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
      for: 15m
      labels:
        severity: warning
    - alert: VeleroBackupFailures
      annotations:
        message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
      expr: |-
        velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
      for: 15m
      labels:
        severity: warning

# This job upgrades the CRDs.
upgradeCRDs: true

# This job is meant primarily for cleaning up CRDs on CI systems.
# Using this on production systems, especially those that have multiple releases of Velero, will be destructive.
cleanUpCRDs: false

##
## Parameters for the `default` BackupStorageLocation and VolumeSnapshotLocation,
## and additional server settings.
##
configuration:
  # Cloud provider being used (e.g. aws, azure, gcp).
  provider: aws

  # Parameters for the `default` BackupStorageLocation. See
  # https://velero.io/docs/v1.6/api-types/backupstoragelocation/
  backupStorageLocation:
    # name is the name of the backup storage location where backups should be stored. If a name is not provided,
    # a backup storage location will be created with the name "default". Optional.
    name: default
    # bucket is the name of the bucket to store backups in. Required.
    bucket: bucketname
    # prefix is the directory under which all Velero data should be stored within the bucket. Optional.
    prefix: prefix
    # Additional provider-specific configuration. See link above
    # for details of required/optional fields for your provider.
    config:
      region: eu-central-1

    # Parameters for the `default` VolumeSnapshotLocation. See
    # https://velero.io/docs/v1.6/api-types/volumesnapshotlocation/
    volumeSnapshotLocation:
      # name is the name of the volume snapshot location where snapshots are being taken. Required.
      name: default
      # Additional provider-specific configuration. See link above
      # for details of required/optional fields for your provider.
      config:
        region: eu-central-1

# Information about the Kubernetes service account Velero uses.
serviceAccount:
  server:
    create: true
    name: ref+awsssm://k8sBackup/velero/irsa/serviceAccountName
    annotations:
      eks.amazonaws.com/role-arn: ref+awsssm://k8sBackup/velero/irsa/arn

# Info about the secret to be used by the Velero deployment, which
# should contain credentials for the cloud provider IAM account you've
# set up for Velero.
credentials:
  # Whether a secret should be used. Set to false if, for examples:
  # - using kube2iam or kiam to provide AWS IAM credentials instead of providing the key file. (AWS only)
  # - using workload identity instead of providing the key file. (GCP only)
  useSecret: false

# Backup schedules to create.
schedules:
  backup:
    schedule: "0 7 * * *"
    template:
      includedNamespaces:
        - "*"
      storageLocation: default
      volumeSnapshotLocations:
        - default
      ttl: "120h0m0s"
      metadata:
        labels:
          cluster: ref+awsssm://k8sBackup/velero/eksClusterName

When describing logs by using command "velero backup logs velero-backup-20230127070009 -n kube-backup |grep -Ev info" time="2023-01-27T07:00:46Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-54ed41e9-e9a1-42c4-8170-5cbbc843a50c namespace= persistentVolume=pvc-54ed41e9-e9a1-42c4-8170-5cbbc843a50c resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:49Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-1e858410-4b99-4dfb-af16-3ed049bd361a namespace= persistentVolume=pvc-1e858410-4b99-4dfb-af16-3ed049bd361a resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:49Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-9b1e6c73-139d-4d43-bcc5-16e369e5beea namespace= persistentVolume=pvc-9b1e6c73-139d-4d43-bcc5-16e369e5beea resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:49Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-11cca224-3ca7-430c-85ae-cb00deac228c namespace= persistentVolume=pvc-11cca224-3ca7-430c-85ae-cb00deac228c resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:49Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-4156c71e-338f-4531-91d3-2477add2d990 namespace= persistentVolume=pvc-4156c71e-338f-4531-91d3-2477add2d990 resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:50Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-e978e812-1901-43d0-b570-953b55daaaf7 namespace= persistentVolume=pvc-e978e812-1901-43d0-b570-953b55daaaf7 resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:51Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-8d1de1ca-5e71-465b-adb5-17447b6af1f6 namespace= persistentVolume=pvc-8d1de1ca-5e71-465b-adb5-17447b6af1f6 resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:51Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-aa69c5ea-dd22-4b92-a423-172e89f85db5 namespace= persistentVolume=pvc-aa69c5ea-dd22-4b92-a423-172e89f85db5 resource=persistentvolumes volumeSnapshotLocation=default time="2023-01-27T07:00:51Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=kube-backup/velero-backup-20230127070009 error="rpc error: code = Unknown desc = missing region in aws configuration" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/volume_snapshotter.go:82" error.function="main.(*VolumeSnapshotter).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-7ab1fd3e-facf-4bdc-9817-b041bd0bde12 namespace= persistentVolume=pvc-7ab1fd3e-facf-4bdc-9817-b041bd0bde12 resource=persistentvolumes volumeSnapshotLocation=default

Please let me know if something else is missing.. Configuration we have mentioned the region , error it yet says region is missing and volume snapshot location

draghuram commented 1 year ago

Can you describe how you installed Velero and how you updated the region? My guess is that the change to add region is not reflected in the cluster. You can check the spec of "default" VSL and see if there is "region" field. Note that the output of VSL you posted above didn't have "region" field. You can perhaps do "kubectl edit" and add "region" there.

Lakshmi-r21 commented 1 year ago

Below is the steps to install velero.. Region is eu-central-1 from beginning we have not changed it kubectl create namespace kube-backup || echo "Namespace already exists!" helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts helm repo update  helm secrets upgrade --install velero --namespace kube-backup -f values.yml    

draghuram commented 1 year ago

Thanks. Can you also try the other steps I mentioned? Basically do "kubectl get" of VSL and verify if "region" is there. If not, add by doing "kubectl edit".

Lakshmi-r21 commented 1 year ago

In kuebctl edit volume snapshot, i added the region now... `# Please edit the object below. Lines beginning with a '#' will be ignored,

apiVersion: velero.io/v1 kind: VolumeSnapshotLocation metadata: annotations: helm.sh/hook: post-install,post-upgrade,post-rollback helm.sh/hook-delete-policy: before-hook-creation creationTimestamp: "2023-01-27T15:36:50Z" generation: 2 labels: app.kubernetes.io/instance: velero app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: velero helm.sh/chart: velero-2.32.3 name: default spec: config: region: eu-central-1 provider: aws`

In velero release , as we can see below it do has region mentioned there...

Could you please let me know why its not picking from configuration file? is it something we are missing? Every time we upgrade velero to latest version, we need to make sure to edit VSL by kubectl ? As its not taking region from config file

ywk253100 commented 1 year ago

The volumeSnapshotLocation should be in the same level with backupStorageLocation, but in your config, the volumeSnapshotLocation has the wrong indent.

Lakshmi-r21 commented 1 year ago

Thanks for the information. It helped