hashicorp / vault-secrets-operator

The Vault Secrets Operator (VSO) allows Pods to consume Vault secrets natively from Kubernetes Secrets.
https://hashicorp.com
Other
464 stars 99 forks source link

Timed out waiting for cache to be synced for Kind *v1beta1.HCPVaultSecretsApp #399

Closed juris closed 11 months ago

juris commented 1 year ago

Describe the bug After upgrade from 0.2.0 to 0.3.1 application constantly crashes

To Reproduce Steps to reproduce the behavior:

  1. Upgrade from version 0.2.0 to 0.3.1
  2. vault-secrets-operator logs:
    2023-10-04T10:01:55Z    ERROR   Could not wait for Cache to sync    {"controller": "hcpauth", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "HCPAuth", "error": "failed to wait for hcpauth caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.HCPAuth"}
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:203
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:208
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:234
    sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/manager/runnable_group.go:223
    2023-10-04T10:01:55Z    INFO    Stopping and waiting for non leader election runnables
    2023-10-04T10:01:55Z    INFO    Stopping and waiting for leader election runnables
    2023-10-04T10:01:55Z    ERROR   Could not wait for Cache to sync    {"controller": "hcpvaultsecretsapp", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "HCPVaultSecretsApp", "error": "failed to wait for hcpvaultsecretsapp caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.HCPVaultSecretsApp"}
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:203
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:208
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:234
    sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/manager/runnable_group.go:223
    2023-10-04T10:01:55Z    ERROR   error received after stop sequence was engaged  {"error": "failed to wait for hcpvaultsecretsapp caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.HCPVaultSecretsApp"}
    sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/manager/internal.go:490
    2023-10-04T10:01:55Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "vaultconnection", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultConnection"}
    2023-10-04T10:01:55Z    INFO    All workers finished    {"controller": "vaultconnection", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultConnection"}
    2023-10-04T10:01:55Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "vaultstaticsecret", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultStaticSecret"}
    2023-10-04T10:01:55Z    INFO    All workers finished    {"controller": "vaultstaticsecret", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultStaticSecret"}
    2023-10-04T10:01:55Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "vaultdynamicsecret", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultDynamicSecret"}
    2023-10-04T10:01:55Z    INFO    All workers finished    {"controller": "vaultdynamicsecret", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultDynamicSecret"}
    2023-10-04T10:01:55Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "vaultpkisecret", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultPKISecret"}
    2023-10-04T10:01:55Z    INFO    All workers finished    {"controller": "vaultpkisecret", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultPKISecret"}
    2023-10-04T10:01:55Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "vaultauth", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultAuth"}
    2023-10-04T10:01:55Z    INFO    All workers finished    {"controller": "vaultauth", "controllerGroup": "secrets.hashicorp.com", "controllerKind": "VaultAuth"}
    2023-10-04T10:01:55Z    INFO    Stopping and waiting for caches
    2023-10-04T10:01:55Z    INFO    Stopping and waiting for webhooks
    2023-10-04T10:01:55Z    INFO    Stopping and waiting for HTTP servers
    2023-10-04T10:01:55Z    INFO    shutting down server    {"kind": "health probe", "addr": "[::]:8081"}
    2023-10-04T10:01:55Z    INFO    controller-runtime.metrics  Shutting down metrics server with timeout of 1 minute
    2023-10-04T10:01:55Z    INFO    Wait completed, proceeding to shutdown the manager
    2023-10-04T10:01:55Z    ERROR   setup   problem running manager {"error": "failed to wait for hcpauth caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.HCPAuth"}
    main.main
    /home/runner/work/vault-secrets-operator/vault-secrets-operator/main.go:313
    runtime.main
    /opt/hostedtoolcache/go/1.20.7/x64/src/runtime/proc.go:250

Application deployment:

vault:
  authentication:
    path: kubernetes
    role: vault-config-operator

vault-secrets-operator:
  controller:
    replicas: 1

    kubeRbacProxy:
      image:
        repository: gcr.io/kubebuilder/kube-rbac-proxy
        tag: v0.11.0

      resources:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 5m
          memory: 64Mi

    manager:
      image:
        repository: hashicorp/vault-secrets-operator
        tag: 0.3.1

      clientCache:
        persistenceModel: "none"
        cacheSize: "1000"

      maxConcurrentReconciles: "100"

      resources:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 10m
          memory: 64Mi

    controllerConfigMapYaml:
      health:
        healthProbeBindAddress: :8081
      leaderElection:
        leaderElect: true
        resourceName: REDACTED
      metrics:
        bindAddress: 127.0.0.1:8080
      webhook:
        port: 9443

  metricsService:
    ports:
    - name: https
      port: 8443
      protocol: TCP
      targetPort: https
    type: ClusterIP

  defaultVaultConnection:
    enabled: true
    address: "http://REDACTED.svc:8200"
    skipTLSVerify: true

  defaultAuthMethod:
    enabled: true
    method: kubernetes
    mount: kubernetes

    kubernetes:
      role: "vault-secrets-operator"
      serviceAccount: vault-secrets-operator-controller-manager

  telemetry:
    serviceMonitor:
      enabled: true

Expected behavior Expect application to continue running after upgrade

Environment

benashz commented 1 year ago

Hi @juris.

So sorry to hear that you are having issues with the Operator after upgrading from v0.2.0 to v0.3.1. Would you mind providing the upgrade steps that you took? Are you deploying the Operator from Helm?

Here are the current upgrade steps for Helm: https://developer.hashicorp.com/vault/docs/platform/k8s/vso/installation#upgrading-using-helm

Thanks,

Ben

M-A-X-I-M commented 12 months ago

I have the same error in k9s after upgrade from v0.2.0 -> v0.3.1

image
mukulgit123 commented 12 months ago

Ideally crds should get updated automatically. We are using flux to install hashicorp vault secrets operator in our environments but the two new crds added after v0.2.0 are not getting added to our cluster after updating the chart version. However, when we apply these crds manually hashicorp vault secret manager controller pod starts working fine.

benashz commented 11 months ago

@mukulgit123 @M-A-X-I-M unfortunately Helm does not update nor deploy new CRDs for any given chart. See https://developer.hashicorp.com/vault/docs/platform/k8s/vso/installation#updating-crds for the manual process that should ensure a smooth upgrade with the Helm chart.

@mukulgit123 I believe that Kustomize will update/create new charts upon VSO upgrade. That could be an option that would work in your GitOps workflow.

benashz commented 11 months ago

Closing this issue since the procedure for upgrading VSO with Helm is documented here: https://developer.hashicorp.com/vault/docs/platform/k8s/vso/installation#upgrading-using-helm