vmware-tanzu / helm-charts

Contains Helm charts for Kubernetes related open source tools
https://vmware-tanzu.github.io/helm-charts/
Apache License 2.0
252 stars 362 forks source link

order of initialization "backupstoragelocation missing" #410

Open flostru opened 2 years ago

flostru commented 2 years ago

What steps did you take and what happened:

Since plugin installation (vsphere) happens as init-container the first install of the chart fails because no backupstoragelocation is created yet.

Reproduce:

helm install with aws and vsphere plugin.

values.yaml:

configuration:
  backupStorageLocation:
    bucket: my-bucket
    config:
      region: minio
      s3ForcePathStyle: true
      s3Url: https://my-minio-server
    default: true
    name: minio
  extraEnvVars:
    TZ: CET
  provider: aws
  volumeSnapshotLocation:
    bucket: my-bucket
    config:
      region: minio
      s3Url: https://my-minio-server
    name: minio
credentials:
  existingSecret: my-existing-secret
  useSecret: true
initContainers:
- image: velero/velero-plugin-for-aws:v1.5.0
  imagePullPolicy: IfNotPresent
  name: velero-plugin-for-aws
  volumeMounts:
  - mountPath: /target
    name: plugins
- image: vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.0
  imagePullPolicy: IfNotPresent
  name: velero-plugin-for-vsphere
  volumeMounts:
  - mountPath: /target
    name: plugins

What did you expect to happen:

Successfull initialization

The output of the following commands will help us better understand what's going on:

Output of initcontainer vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.0

velero is running in the namespace, velero
--
Mon, Oct 10 2022 2:12:47 pm | The prerequisite checks for backup-driver started
Mon, Oct 10 2022 2:12:47 pm | Using image vsphereveleroplugin/backup-driver:v1.4.0
Mon, Oct 10 2022 2:12:47 pm | BackupDriver: Determined the cluster flavor as: vSphere Kubernetes Cluster
Mon, Oct 10 2022 2:12:47 pm | Detected Cluster type vSphere Kubernetes Cluster during BackupDriver install
Mon, Oct 10 2022 2:12:47 pm | No arguments found, no feature flags detected.The prerequisite checks for backup-driver completed
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/backuprepositories.backupdriver.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/backuprepositories.backupdriver.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/backuprepositories.backupdriver.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/backuprepositoryclaims.backupdriver.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/backuprepositoryclaims.backupdriver.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/backuprepositoryclaims.backupdriver.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/clonefromsnapshots.backupdriver.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/clonefromsnapshots.backupdriver.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/clonefromsnapshots.backupdriver.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:47 pm | CustomResourceDefinition/deletesnapshots.backupdriver.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/deletesnapshots.backupdriver.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/deletesnapshots.backupdriver.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/snapshots.backupdriver.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/snapshots.backupdriver.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/snapshots.backupdriver.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/downloads.datamover.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/downloads.datamover.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/downloads.datamover.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/uploads.datamover.cnsdp.vmware.com: attempting to create resource
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/uploads.datamover.cnsdp.vmware.com: already exists, proceeding
Mon, Oct 10 2022 2:12:48 pm | CustomResourceDefinition/uploads.datamover.cnsdp.vmware.com: created
Mon, Oct 10 2022 2:12:48 pm | Waiting for resources to be ready in cluster...
Mon, Oct 10 2022 2:13:48 pm | Deployment/backup-driver: attempting to create resource
Mon, Oct 10 2022 2:13:48 pm | Deployment/backup-driver: created
Mon, Oct 10 2022 2:13:48 pm | Waiting for backup-driver deployment to be ready.
Mon, Oct 10 2022 2:14:48 pm | An error occurred:
Mon, Oct 10 2022 2:14:48 pm |  
Mon, Oct 10 2022 2:14:48 pm | Error installing backup-driver. Use `kubectl logs deploy/backup-driver -n velero` to check the logs: timed out waiting for the condition

Output of kubectl logs deploy/backup-driver -n velero

---snipp---
2022-10-10T12:19:39.368Z info -[00013] [Originator@6876 sub=Default] ReaperManager Initialized
time="2022-10-10T12:19:39Z" level=info msg="Initialized VDDK" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/ivd/ivd_protected_entity_type_manager.go:407"
time="2022-10-10T12:19:39Z" level=info msg="Load Config of IVD Protected Entity Manager completed successfully" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/ivd/ivd_protected_entity_type_manager.go:408"
time="2022-10-10T12:19:39Z" level=info msg="SnapshotManager is initialized with the configuration: map[LocalMode:false SnapshotManagerLocation:DataServer]" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:142"
time="2022-10-10T12:19:39Z" level=info msg="RetrieveVSLFromVeleroBSLs: Failed to get Velero default backup storage location, attempting to find available BSL" error="backupstoragelocations.velero.io \"default\" not found" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/utils/utils.go:308"
time="2022-10-10T12:19:39Z" level=error msg="RetrieveVSLFromVeleroBSLs: Failed to list Velero default backup storage location" error="<nil>" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/utils/utils.go:312"
time="2022-10-10T12:19:39Z" level=info msg="SnapshotManager: Velero Backup Storage Location is retrieved, region=<nil>, bucket=<nil>" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:160"
time="2022-10-10T12:19:39Z" level=info msg="No such key region in params map" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/utils/utils.go:477"
time="2022-10-10T12:19:39Z" level=error msg="Failed to get s3PETM from params map: region=<nil>, bucket=<nil>" error="Missing region param, cannot initialize S3 PETM" error.file="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/utils/utils.go:375" error.function=github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/utils.GetS3PETMFromParamsMap logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:164"
An error occurred: Missing region param, cannot initialize S3 PETM

Anything else you would like to add:

If i add the vsphere init container later, after installing the helm chart once with only the aws plugin, it works fine since in a later step (after init containers) the default backupstoragelocation gets created and i can add the vsphere plugin.

So it looks like actually we need a 2 step installation of the helm chart.

  1. without vsphere plugin
  2. add the sphere initcontainer

This is odd for automated bootstrapping.

Environment:

jenting commented 2 years ago

Is vsphereveleroplugin/backup-driver:v1.4.0 equals to vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.0?

flostru commented 2 years ago

Is vsphereveleroplugin/backup-driver:v1.4.0 equals to vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.0?

Actually while sidecar initializes it creates a deployment called backup-driver with image vsphereveleroplugin/backup-driver:v1.4.0 wich fails and gives above error log since it will need a Velero Backup Storage Location but that isnt created yet since initial velero deployment isnt done yet.

Classical bootstrap problem.

And no vsphereveleroplugin/backup-driver:v1.4.0 is different from vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.0

According to the documentation for "velero-plugin-for-vsphere" https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/vanilla.md vsphereveleroplugin/velero-plugin-for-vsphere:v1.3.0 is the correct image. But i use later 1.4.0 but 1.3.0 has the same behaviour.

jenting commented 2 years ago

@flostru The helm chart can't guarantee the ordering, and the velero-plugin-for-vsphere requires the BSL is available. So, I think it's the velero-plugin-for-vsphere design.

Is there any way the velero-plugin-for-vsphere could skip checking if the BSL is available?

JuanGarcia01 commented 1 year ago

@flostru @jenting

Did you reach a resolution to this issue?

I am having seeing the same issue noted and seeing the same errors.

the init container spins up with the following image... velero-plugin-for-vsphere:v1.4.2

and then creates a new backup-driver deployment with the following image image - backup-driver:v1.4.2 and then init container also creates a new daemonset.apps/datamgr-for-vsphere-plugin with the following image - data-manager-for-plugin:v1.4.2

both the backup driver deployment and the daemonset.apps/datamgr-for-vsphere-plugin fail with the error messages noted above.

flostru commented 1 year ago

@JuanGarcia01 actually we have implemented a workaround by removing the initcontainer and triggering a kubernetes Job that installs the plugin after the helm installation is done.

apiVersion: batch/v1
kind: Job
metadata:
  name: install-velero-plugin-for-vsphere
  namespace: velero
spec:
  backoffLimit: 4
  template:
    spec:
      automountServiceAccountToken: true
      containers:
        - args:
            - '-n'
            - velero
            - plugin
            - add
            - vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.2
          command:
            - /velero
          image: velero/velero:v1.10.1
          imagePullPolicy: IfNotPresent
          name: install-velero-plugin-for-vsphere
      restartPolicy: Never
      serviceAccountName: velero

Easy since we use terraform

Here the part of our tf module that does this.

resource "kubernetes_job" "velero_plugin_for_vsphere" {
  metadata {
    name      = "install-velero-plugin-for-vsphere"
    namespace = var.namespace
  }
  spec {
    template {
      metadata {}
      spec {
        container {
          name    = "install-velero-plugin-for-vsphere"
          image   = "velero/velero:${var.velero_cli_version}"
          command = ["/velero"]
          args = [
            "-n",
            "${var.namespace}",
            "plugin",
            "add",
            "vsphereveleroplugin/velero-plugin-for-vsphere:${var.velero_plugin_for_vsphere_version}",
          ]
        }
        restart_policy       = "Never"
        service_account_name = "velero"
      }
    }
    backoff_limit = 4
  }
}