vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.55k stars 1.38k forks source link

Velero Server command resulting in Bsl posting unavailable status #8051

Closed RUPESHKUMARPALLAI closed 1 month ago

RUPESHKUMARPALLAI commented 1 month ago

What steps did you take and what happened:

I am using velero for backup and restore and these functionalities are working fine and as expected without any error. But due to use case I want to change restore resource order priority. While running the Velero server command for the same. I am getting error post `

ERRO[0012] Error getting a backup store backup-storage-location=velero/default controller=backup-storage-location error="unable to locate ObjectStore plugin named velero.io/gcp" logSource="pkg/controller/backup_storage_location_controller.go:137"

` Which is only arising when I am using velero server command

Commands I have used

 velero server \
--restore-resource-priorities=customresourcedefinitions, events, ResourceQuota, Secret, ServiceAccount, Lease, clusterrole, clusterrolebinding, role, rolebinding, ConfigMap, MtlConfig, Deployment, Rollout, ReplicaSet, ControllerRevision, Service, ScaledObject, HorizontalPodAutoscaler, Job, CronJob, Gateway, HTTPRoute, GCPBackendPolicy, GCPGatewayPolicy, HealthCheckPolicy, MultiClusterService, MultiClusterIngress, FQDNNetworkPolicy, NetworkLogging, RedirectService, ServiceNetworkEndpointGroup, Gateway.networking.istio.io, EnvoyFilter, DestinationRule, ServiceEntry, Sidecar, VirtualService, Ingress, NetworkPolicy, InternetPermission, ServiceInMeshConfig, Telemetry

and

velero server --backup-sync-period 2m

What did you expect to happen: Velero server command changes restore order without any issue

The following information will help us better understand what's going on:

Anything else you would like to add: Can we change restore order using configmaps?

Environment:

Additional Debuggings we did:

root@my-release-velero-7776f66db9-kwf85:/# getent hosts metadata.google.internal
169.254.169.254 metadata.google.internal

host resolution is working fine

root@my-release-velero-7776f66db9-kwf85:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

but we are seeing no /plugin dir

I am using Ephemeral Container to exec as Velero image don't have bin or sh. kubectl debug -it velero-799d4db584-zqbgs --target=velero --image=jfrog.fkinternal.com/docker-external/ubuntu

our deploy manifest

apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
      meta.helm.sh/release-name: my-release
      meta.helm.sh/release-namespace: velero
    creationTimestamp: "2024-06-21T06:05:49Z"
    generation: 5
    labels:
      app.kubernetes.io/instance: my-release
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: velero
      component: velero
      helm.sh/chart: velero-6.7.0
    name: my-release-velero
    namespace: velero
    resourceVersion: "43569836"
    uid: aa687185-97fe-4517-bc2c-2aba70aaa371
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app.kubernetes.io/instance: my-release
        app.kubernetes.io/name: velero
    strategy:
      type: Recreate
    template:
      metadata:
        annotations:
          prometheus.io/path: /metrics
          prometheus.io/port: "8085"
          prometheus.io/scrape: "true"
        creationTimestamp: null
        labels:
          app.kubernetes.io/instance: my-release
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: velero
          name: velero
      spec:
        containers:
        - args:
          - server
          - --uploader-type=kopia
          command:
          - /velero
          env:
          - name: VELERO_SCRATCH_DIR
            value: /scratch
          - name: VELERO_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: LD_LIBRARY_PATH
            value: /plugins
          image: jfrog.fkinternal.com/docker-external/velero/velero:v1.13.2
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /metrics
              port: http-monitoring
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 5
          name: velero
          ports:
          - containerPort: 8085
            name: http-monitoring
            protocol: TCP
          readinessProbe:
            failureThreshold: 5
            httpGet:
              path: /metrics
              port: http-monitoring
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 5
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /plugins
            name: plugins
          - mountPath: /scratch
            name: scratch
        dnsPolicy: Default
        initContainers:
        - image: jfrog.fkinternal.com/docker-external/velero/velero-plugin-for-gcp:v1.6.0
          imagePullPolicy: Always
          name: velero-plugin-for-gcp
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /target
            name: plugins
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: velero
        serviceAccountName: velero
        terminationGracePeriodSeconds: 3600
        volumes:
        - emptyDir: {}
          name: plugins
        - emptyDir: {}
          name: scratch

Pls note the issue only happens when using Velero server command. Regular backup and restores are happening as expected and Bsl posting available other time.

blackpiglet commented 1 month ago

First, please notice the --restore-resource-priorities parameter's format is something like --<HighPriorityResource1>,<HighPriorityResource2>-<LowPriorityResource1>,<LowPriorityResource2>. https://github.com/vmware-tanzu/velero/blob/53b57f8bdfce0347da367d1bc519acd778a426d3/pkg/restore/priority.go#L47-L87

Second, when using the Velero as velero server CLI, the plugins and the credentials should be set manually to work.

RUPESHKUMARPALLAI commented 1 month ago

There is some configuration issue with the priority arg.