vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.69k stars 1.4k forks source link

PartiallyFailed: Exiting RemapCRDVersionAction, the cluster does not support v1beta1 CRD #5045

Closed stefnats closed 2 years ago

stefnats commented 2 years ago

Hey there,

i'm using the latest helm chart and the latest client:

❯ velero version
Client:
        Version: v1.8.1
        Git commit: -
Server:
        Version: v1.8.1

I am also using Digitalocean (Kubernetes 1.22.8) and this helm chart yaml:

##
## Configuration settings that directly affect the Velero deployment YAML.
##

# Details of the container image to use in the Velero deployment & daemonset (if
# enabling restic). Required.
image:
  repository: velero/velero
  tag: v1.8.1
  # Digest value example: sha256:d238835e151cec91c6a811fe3a89a66d3231d9f64d09e5f3c49552672d271f38.
  # If used, it will take precedence over the image.tag.
  # digest:
  pullPolicy: IfNotPresent
  # One or more secrets to be used when pulling images
  imagePullSecrets: []
  # - registrySecretName

# Annotations to add to the Velero deployment's. Optional.
#
# If you are using reloader use the following annotation with your VELERO_SECRET_NAME
annotations: {}
# secret.reloader.stakater.com/reload: "<VELERO_SECRET_NAME>"

# Labels to add to the Velero deployment's. Optional.
labels: {}

# Annotations to add to the Velero deployment's pod template. Optional.
#
# If using kube2iam or kiam, use the following annotation with your AWS_ACCOUNT_ID
# and VELERO_ROLE_NAME filled in:
podAnnotations: {}
#  iam.amazonaws.com/role: "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<VELERO_ROLE_NAME>"

# Additional pod labels for Velero deployment's template. Optional
# ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
podLabels: {}

# Resource requests/limits to specify for the Velero deployment.
# https://velero.io/docs/v1.6/customize-installation/#customize-resource-requests-and-limits
resources:
  requests:
    cpu: 500m
    memory: 128Mi
  limits:
    cpu: 1000m
    memory: 512Mi

# Configure the dnsPolicy of the Velero deployment
# See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
dnsPolicy: ClusterFirst

# Init containers to add to the Velero deployment's pod spec. At least one plugin provider image is required.
# If the value is a string then it is evaluated as a template.
initContainers:
# - name: velero-plugin-for-csi
#   image: velero/velero-plugin-for-csi:v0.2.0
#   imagePullPolicy: IfNotPresent
#   volumeMounts:
#     - mountPath: /target
#       name: plugins
 - name: velero-plugin-for-aws
   image: velero/velero-plugin-for-aws:v1.4.1
   imagePullPolicy: IfNotPresent
   volumeMounts:
     - mountPath: /target
       name: plugins
 - name: digitalocean-velero-plugin
   image: digitalocean/velero-plugin:v1.1.0
   imagePullPolicy: IfNotPresent
   volumeMounts:
     - mountPath: /target
       name: plugins

# SecurityContext to use for the Velero deployment. Optional.
# Set fsGroup for `AWS IAM Roles for Service Accounts`
# see more informations at: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
podSecurityContext: {}
# fsGroup: 1337

# Container Level Security Context for the 'velero' container of the Velero deployment. Optional.
# See: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-container
containerSecurityContext: {}
  # allowPrivilegeEscalation: false
  # capabilities:
  #   drop: ["ALL"]
  #   add: []
# readOnlyRootFilesystem: true

# Pod priority class name to use for the Velero deployment. Optional.
priorityClassName: ""

# Tolerations to use for the Velero deployment. Optional.
tolerations: []

# Affinity to use for the Velero deployment. Optional.
affinity: {}

# Node selector to use for the Velero deployment. Optional.
nodeSelector: {}

# Extra volumes for the Velero deployment. Optional.
extraVolumes: []

# Extra volumeMounts for the Velero deployment. Optional.
extraVolumeMounts: []

# Extra K8s manifests to deploy
extraObjects: []
  # - apiVersion: secrets-store.csi.x-k8s.io/v1
  #   kind: SecretProviderClass
  #   metadata:
  #     name: velero-secrets-store
  #   spec:
  #     provider: aws
  #     parameters:
  #       objects: |
  #         - objectName: "velero"
  #           objectType: "secretsmanager"
  #           jmesPath:
  #               - path: "access_key"
  #                 objectAlias: "access_key"
  #               - path: "secret_key"
  #                 objectAlias: "secret_key"
  #     secretObjects:
  #       - data:
  #         - key: access_key
  #           objectName: client-id
  #         - key: client-secret
  #           objectName: client-secret
  #         secretName: velero-secrets-store
#         type: Opaque

# Settings for Velero's prometheus metrics. Enabled by default.
metrics:
  enabled: true
  scrapeInterval: 30s
  scrapeTimeout: 10s

  # service metdata if metrics are enabled
  service:
    annotations: {}
    labels: {}

  # Pod annotations for Prometheus
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8085"
    prometheus.io/path: "/metrics"

  serviceMonitor:
    enabled: false
    additionalLabels: {}
    # ServiceMonitor namespace. Default to Velero namespace.
    # namespace:

kubectl:
  image:
    repository: docker.io/bitnami/kubectl
    # Digest value example: sha256:d238835e151cec91c6a811fe3a89a66d3231d9f64d09e5f3c49552672d271f38.
    # If used, it will take precedence over the kubectl.image.tag.
    # digest:
    # kubectl image tag. If used, it will take precedence over the cluster Kubernetes version.
    # tag: 1.16.15
  # Container Level Security Context for the 'kubectl' container of the crd jobs. Optional.
  # See: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-container
  containerSecurityContext: {}
  # Resource requests/limits to specify for the upgrade/cleanup job. Optional
  resources: {}
  # Annotations to set for the upgrade/cleanup job. Optional.
  annotations: {}
  # Labels to set for the upgrade/cleanup job. Optional.
  labels: {}

# This job upgrades the CRDs.
upgradeCRDs: true

# This job is meant primarily for cleaning up CRDs on CI systems.
# Using this on production systems, especially those that have multiple releases of Velero, will be destructive.
cleanUpCRDs: false

##
## End of deployment-related settings.
##

##
## Parameters for the `default` BackupStorageLocation and VolumeSnapshotLocation,
## and additional server settings.
##
configuration:
  # Cloud provider being used (e.g. aws, azure, gcp).
  provider: aws

  # Parameters for the `default` BackupStorageLocation. See
  # https://velero.io/docs/v1.6/api-types/backupstoragelocation/
  backupStorageLocation:
    # name is the name of the backup storage location where backups should be stored. If a name is not provided,
    # a backup storage location will be created with the name "default". Optional.
    name: default
    # provider is the name for the backup storage location provider. If omitted
    # `configuration.provider` will be used instead.
    provider: aws
    # bucket is the name of the bucket to store backups in. Required.
    bucket: xxx
    # caCert defines a base64 encoded CA bundle to use when verifying TLS connections to the provider. Optional.
    caCert:
    # prefix is the directory under which all Velero data should be stored within the bucket. Optional.
    prefix:
    # default indicates this location is the default backup storage location. Optional.
    default:
    # accessMode determines if velero can write to this backup storage location. Optional.
    # default to ReadWrite, ReadOnly is used during migrations and restores.
    accessMode: ReadWrite
    # Additional provider-specific configuration. See link above
    # for details of required/optional fields for your provider.
    config:
      region: fra1
    #  s3ForcePathStyle:
      s3Url: https://fra1.digitaloceanspaces.com
    #  kmsKeyId:
    #  resourceGroup:
    #  The ID of the subscription containing the storage account, if different from the cluster’s subscription. (Azure only)
    #  subscriptionId:
    #  storageAccount:
    #  publicUrl:
    #  Name of the GCP service account to use for this backup storage location. Specify the
    #  service account here if you want to use workload identity instead of providing the key file.(GCP only)
    #  serviceAccount:

  # Parameters for the `default` VolumeSnapshotLocation. See
  # https://velero.io/docs/v1.6/api-types/volumesnapshotlocation/
  volumeSnapshotLocation:
    # name is the name of the volume snapshot location where snapshots are being taken. Required.
    name:
    # provider is the name for the volume snapshot provider. If omitted
    # `configuration.provider` will be used instead.
    provider:
    # Additional provider-specific configuration. See link above
    # for details of required/optional fields for your provider.
    config: {}
      #region: fra1
        #  s3ForcePathStyle:
      #s3Url: https://fra1.digitaloceanspaces.com
  #    region:
  #    apiTimeout:
  #    resourceGroup:
  #    The ID of the subscription where volume snapshots should be stored, if different from the cluster’s subscription. If specified, also requires `configuration.volumeSnapshotLocation.config.resourceGroup`to be set. (Azure only)
  #    subscriptionId:
  #    incremental:
  #    snapshotLocation:
  #    project:

  # These are server-level settings passed as CLI flags to the `velero server` command. Velero
  # uses default values if they're not passed in, so they only need to be explicitly specified
  # here if using a non-default value. The `velero server` default values are shown in the
  # comments below.
  # --------------------
  # `velero server` default: 1m
  backupSyncPeriod:
  # `velero server` default: 1h
  resticTimeout:
  # `velero server` default: namespaces,persistentvolumes,persistentvolumeclaims,secrets,configmaps,serviceaccounts,limitranges,pods
  restoreResourcePriorities:
  # `velero server` default: false
  restoreOnlyMode:
  # `velero server` default: 20.0
  clientQPS:
  # `velero server` default: 30
  clientBurst:
  # `velero server` default: empty
  disableControllers:
  #

  # additional key/value pairs to be used as environment variables such as "AWS_CLUSTER_NAME: 'yourcluster.domain.tld'"
  extraEnvVars: {}

  # Comma separated list of velero feature flags. default: empty
  # features: EnableCSI
  features:

  # Set log-level for Velero pod. Default: info. Other options: debug, warning, error, fatal, panic.
  logLevel:

  # Set log-format for Velero pod. Default: text. Other option: json.
  logFormat:

  # Set true for backup all pod volumes without having to apply annotation on the pod when used restic Default: false. Other option: false.
  defaultVolumesToRestic:

  # How often 'restic prune' is run for restic repositories by default. Default: 168h. Optional.
  defaultResticPruneFrequency:

##
## End of backup/snapshot location settings.
##

##
## Settings for additional Velero resources.
##

rbac:
  # Whether to create the Velero role and role binding to give all permissions to the namespace to Velero.
  create: true
  # Whether to create the cluster role binding to give administrator permissions to Velero
  clusterAdministrator: true
  # Name of the ClusterRole.
  clusterAdministratorName: cluster-admin

# Information about the Kubernetes service account Velero uses.
serviceAccount:
  server:
    create: true
    name:
    annotations:
    labels:

# Info about the secret to be used by the Velero deployment, which
# should contain credentials for the cloud provider IAM account you've
# set up for Velero.
credentials:
  # Whether a secret should be used. Set to false if, for examples:
  # - using kube2iam or kiam to provide AWS IAM credentials instead of providing the key file. (AWS only)
  # - using workload identity instead of providing the key file. (GCP only)
  useSecret: true
  # Name of the secret to create if `useSecret` is true and `existingSecret` is empty
  name:
  # Name of a pre-existing secret (if any) in the Velero namespace
  # that should be used to get IAM account credentials. Optional.
  existingSecret:
  # Data to be stored in the Velero secret, if `useSecret` is true and `existingSecret` is empty.
  # As of the current Velero release, Velero only uses one secret key/value at a time.
  # The key must be named `cloud`, and the value corresponds to the entire content of your IAM credentials file.
  # Note that the format will be different for different providers, please check their documentation.
  # Here is a list of documentation for plugins maintained by the Velero team:
  # [AWS] https://github.com/vmware-tanzu/velero-plugin-for-aws/blob/main/README.md
  # [GCP] https://github.com/vmware-tanzu/velero-plugin-for-gcp/blob/main/README.md
  # [Azure] https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/README.md
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=xxxx
      aws_secret_access_key="yyyyy"
  # additional key/value pairs to be used as environment variables such as "DIGITALOCEAN_TOKEN: <your-key>". Values will be stored in the secret.
  extraEnvVars:
    DIGITALOCEAN_TOKEN: "foobar"
  # Name of a pre-existing secret (if any) in the Velero namespace
  # that will be used to load environment variables into velero and restic.
  # Secret should be in format - https://kubernetes.io/docs/concepts/configuration/secret/#use-case-as-container-environment-variables
  extraSecretRef: ""

# Whether to create backupstoragelocation crd, if false => do not create a default backup location
backupsEnabled: true
# Whether to create volumesnapshotlocation crd, if false => disable snapshot feature
snapshotsEnabled: true

# Whether to deploy the restic daemonset.
deployRestic: false

restic:
  podVolumePath: /var/lib/kubelet/pods
  privileged: false
  # Pod priority class name to use for the Restic daemonset. Optional.
  priorityClassName: ""
  # Resource requests/limits to specify for the Restic daemonset deployment. Optional.
  # https://velero.io/docs/v1.6/customize-installation/#customize-resource-requests-and-limits
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 1000m
      memory: 1024Mi

  # Tolerations to use for the Restic daemonset. Optional.
  tolerations: []

  # Annotations to set for the Restic daemonset. Optional.
  annotations: {}

  # labels to set for the Restic daemonset. Optional.
  labels: {}

  # will map /scratch to emptyDir. Set to false and specify your own volume
  # via extraVolumes and extraVolumeMounts that maps to /scratch
  # if you don't want to use emptyDir.
  useScratchEmptyDir: true

  # Extra volumes for the Restic daemonset. Optional.
  extraVolumes: []

  # Extra volumeMounts for the Restic daemonset. Optional.
  extraVolumeMounts: []

  # Configure the dnsPolicy of the Restic daemonset
  # See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
  dnsPolicy: ClusterFirst

  # SecurityContext to use for the Velero deployment. Optional.
  # Set fsGroup for `AWS IAM Roles for Service Accounts`
  # see more informations at: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
  podSecurityContext:
    runAsUser: 0
    # fsGroup: 1337

  # Container Level Security Context for the 'restic' container of the restic DaemonSet. Optional.
  # See: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-container
  containerSecurityContext: {}

  # Node selector to use for the Restic daemonset. Optional.
  nodeSelector: {}

# Backup schedules to create.
# Eg:
# schedules:
#   mybackup:
#     disabled: false
#     labels:
#       myenv: foo
#     annotations:
#       myenv: foo
#     schedule: "0 0 * * *"
#     useOwnerReferencesInBackup: false
#     template:
#       ttl: "240h"
#       includedNamespaces:
#       - foo
schedules: {}

# Velero ConfigMaps.
# Eg:
# configMaps:
#   restic-restore-action-config:
#     labels:
#       velero.io/plugin-config: ""
#       velero.io/restic: RestoreItemAction
#     data:
#       image: velero/velero-restic-restore-helper:v1.8.1
configMaps: {}

##
## End of additional Velero resource settings.
##

Now when backing up, for example with this command

velero backup create rabbitmq-backup --include-namespaces=my-rabbitmq --ttl=168h

It says:

Name:         rabbitmq-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.22.8
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=22

Phase:  PartiallyFailed (run `velero backup logs rabbitmq-backup` for more information)

Errors:    1
Warnings:  0

Now my logs end with this:

time="2022-06-23T09:44:09Z" level=info msg="Backing up item" backup=velero/rabbitmq-backup logSource="pkg/backup/item_backupper.go:122" name=ciliumendpoints.cilium.io namespace= resource=customresourcedefinitions.apiextensions.k8s.io
time="2022-06-23T09:44:09Z" level=info msg="Executing custom action" backup=velero/rabbitmq-backup logSource="pkg/backup/item_backupper.go:311" name=ciliumendpoints.cilium.io namespace= resource=customresourcedefinitions.apiextensions.k8s.io
time="2022-06-23T09:44:09Z" level=info msg="Executing RemapCRDVersionAction" backup=velero/rabbitmq-backup cmd=/velero logSource="pkg/backup/remap_crd_version_action.go:61" pluginName=velero
time="2022-06-23T09:44:09Z" level=info msg="Exiting RemapCRDVersionAction, the cluster does not support v1beta1 CRD" backup=velero/rabbitmq-backup cmd=/velero logSource="pkg/backup/remap_crd_version_action.go:89" pluginName=velero
time="2022-06-23T09:44:09Z" level=info msg="Backed up a total of 82 items" backup=velero/rabbitmq-backup logSource="pkg/backup/backup.go:405" progress=

What does this mean? I think it's a bug? Just FYI, i am using the RabbitMQ Kubernetes Operator to deploy that cluster.

stefnats commented 2 years ago

Happened to other namespaces / helm charts as well, for example the bitnami mariadb & mongodb images. Same error messages.

stefnats commented 2 years ago

I have seen PR #4686 in 1.9.0, so i installed Server version 1.9.0 but it still fails.

Client Version is 1.8.1, because on macOS homebrew, there is no newer version

stefnats commented 2 years ago

I have now updated Client to 1.9.0, helm uninstalled everything, deleted CRDs from Velero and reinstalled it:

❯ velero version
Client:
        Version: v1.9.0
        Git commit: -
Server:
        Version: v1.9.0

but still the same old error.

reasonerjt commented 2 years ago

@stefnats That log message means the plugin RemapCRDVersionAction was skipped, which is expected behavior for the 1.22+ k8s cluster.

anhqqt commented 2 years ago

I have just met an error like @stefnats . The latest lines of velero logs are the same as you. Actually, my real issue is comes from the VolumeSnapshotClass because it does not have the label velero.io/csi-volumesnapshot-class: "true"

@stefnats can you please run this command to check if it returns any error ? velero backup logs rabbitmq-backup | grep error

This is my result

time="2022-07-08T17:31:59Z" level=info msg="1 errors encountered backup up item" backup=velero/k8s-jenkins-backup logSource="pkg/backup/backup.go:413" name=jenkins-0
time="2022-07-08T17:31:59Z" level=error msg="Error backing up item" backup=velero/k8s-jenkins-backup error="error executing custom action (groupResource=persistentvolumeclaims, namespace=jenkins, name=jenkins-home): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass ebs-sc: failed to get volumesnapshotclass for provisioner ebs.csi.aws.com, ensure that the desired volumesnapshot class has the velero.io/csi-volumesnapshot-class label" logSource="pkg/backup/backup.go:417" name=jenkins-0
reasonerjt commented 2 years ago

Thanks @anhqqt so by adding the label to vsclass the problem's solved right?

@stefnats please confirm if you are seeing the same error, in a few days I may close this issue and the question regarding RemapCRDVersionAction has been answered

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

Closing the stale issue.