kubernetes-sigs / vsphere-csi-driver

vSphere storage Container Storage Interface (CSI) plugin
https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html
Apache License 2.0
295 stars 179 forks source link

csi-controller v2.0.0-rc.1 does not start when no vSAN file service is enabled #193

Closed larhauga closed 4 years ago

larhauga commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened: vsphere csi controller deployment does not start when the controller does not succeed to find file service backed by vSAN. In an environment where vSAN is not present, the getDsToFileServiceEnabled function fails, and the controller does not successfully start.

{"level":"error","time":"2020-04-22T09:06:11.285949514Z","caller":"common/vsphereutil.go:488","msg":"failed to get Datastore managed objects from datastore objects. dsObjList: [], properties: [info summary], err: object references is empty"..."stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/common.getDsToFileServiceEnable
dMap\n\t/build/pkg/csi/service/common/vsphereutil.go:488...
{"level":"error","time":"2020-04-22T09:06:11.286051279Z","caller":"common/vsphereutil.go:420","msg":"failed to query if file service is enabled on vsan datastores or not. error: object refere
nces is empty","TraceId":"e02723e7-6e50-41a0-b37d-28dae787f39b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/common.IsFileServiceEnabled\n
{"level":"error","time":"2020-04-22T09:06:11.286086428Z","caller":"vanilla/controller.go:124","msg":"file service enablement check failed for datastore specified in TargetvSANFileShareDatast$
reURLs. err=object references is empty","TraceId":"e02723e7-6e50-41a0-b37d-28dae787f39b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg$
csi/service/vanilla/controller.go:124
{"level":"error","time":"2020-04-22T09:06:11.286116701Z","caller":"service/service.go:122","msg":"failed to init controller. Error: file service enablement check failed for datastore specifi$
d in TargetvSANFileShareDatastoreURLs

This happens when TargetvSANFileShareDatastoreURLs is not configured.

What you expected to happen: Expected the controller to successfully start, and to be able to provide the devices through the storage class.

How to reproduce it (as minimally and precisely as possible): In an environment without vSAN or file services (vSphere 6.7u3) the controller does not start.

  csi-vsphere.conf: |
    [Global]
    cluster-id = "clusterid"

    insecure-flag = "true"
    datacenters = "dc1,dc2"

    secret-namespace = "vsphere" # overridden by env vars from secrets
    secret-name = "cpi-global-secret" # overridden by env vars from secrets

    [VirtualCenter "<replaced>"]

    [Labels]
    region = k8s-region
    zone = k8s-zone
        - name: vsphere-csi-controller
          image: gcr.io/cloud-provider-vsphere/csi/release/driver:v2.0.0-rc.1
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "rm -rf /var/lib/kubelet/plugins_registry/csi.vsphere.vmware.com"]
          imagePullPolicy: "Always"
          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/kubelet/plugins_registry/csi.sock
            - name: X_CSI_MODE
              value: "controller"
            - name: VSPHERE_CSI_CONFIG
              value: "/etc/cloud/csi-vsphere.conf"
            - name: LOGGER_LEVEL
              value: "DEVELOPMENT" # "PRODUCTION" # Options: DEVELOPMENT, PRODUCTION
            - name: VSPHERE_USER
              valueFrom:
                secretKeyRef:
                  name: cpi-global-secret
                  key: username
            - name: VSPHERE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cpi-global-secret
                  key: password
          volumeMounts:
            - mountPath: /etc/cloud
              name: vsphere-config-volume
              readOnly: true
            - mountPath: /var/lib/kubelet/plugins_registry/
              name: socket-dir
          ports:
            - name: healthz
              containerPort: 9808
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            timeoutSeconds: 3
            periodSeconds: 5
            failureThreshold: 3

Anything else we need to know?: https://github.com/kubernetes-sigs/vsphere-csi-driver/commit/760b9ab86cff4e85c9f5b775e9f67b3bf6a90921 looks to have introduced this bug, where previous errors would be ignored.

Environment:

symbian4sj commented 4 years ago

hello!! csi-controller v2.0.0-rc.1 container doesn't start with following error.

kubectl describe pod vsphere-csi-node-wbnn4 -n=kube-system Events: Type Reason Age From Message


Normal Scheduled 29m default-scheduler Successfully assigned kube-system/vsphere-csi-controller-76b8d7d97b-gz4j9 to seliicbl01526-k8sw Normal Pulling 29m kubelet, seliicbl01526-k8sw Pulling image "quay.io/k8scsi/csi-attacher:v2.0.0" Normal Pulled 29m kubelet, seliicbl01526-k8sw Successfully pulled image "quay.io/k8scsi/csi-attacher:v2.0.0" Normal Created 29m kubelet, seliicbl01526-k8sw Created container csi-attacher Normal Started 29m kubelet, seliicbl01526-k8sw Started container csi-attacher Normal Pulling 29m kubelet, seliicbl01526-k8sw Pulling image "quay.io/k8scsi/csi-resizer:v0.3.0" Normal Created 29m kubelet, seliicbl01526-k8sw Created container csi-resizer Normal Pulled 29m kubelet, seliicbl01526-k8sw Successfully pulled image "quay.io/k8scsi/csi-resizer:v0.3.0" Normal Started 29m kubelet, seliicbl01526-k8sw Started container csi-resizer Normal Pulling 29m kubelet, seliicbl01526-k8sw Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v2.0.0-rc.1" Normal Pulled 29m kubelet, seliicbl01526-k8sw Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/driver:v2.0.0-rc.1" Normal Created 29m kubelet, seliicbl01526-k8sw Created container vsphere-csi-controller Normal Started 29m kubelet, seliicbl01526-k8sw Started container vsphere-csi-controller Normal Pulling 29m kubelet, seliicbl01526-k8sw Pulling image "quay.io/k8scsi/livenessprobe:v1.1.0" Normal Pulling 29m kubelet, seliicbl01526-k8sw Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v2.0.0-rc.1" Normal Created 29m kubelet, seliicbl01526-k8sw Created container liveness-probe Normal Started 29m kubelet, seliicbl01526-k8sw Started container liveness-probe Normal Pulled 29m kubelet, seliicbl01526-k8sw Successfully pulled image "quay.io/k8scsi/livenessprobe:v1.1.0" Normal Pulled 29m kubelet, seliicbl01526-k8sw Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v2.0.0-rc.1" Normal Created 29m kubelet, seliicbl01526-k8sw Created container vsphere-syncer Normal Started 29m kubelet, seliicbl01526-k8sw Started container vsphere-syncer Normal Pulling 29m kubelet, seliicbl01526-k8sw Pulling image "quay.io/k8scsi/csi-provisioner:v1.4.0" Normal Pulled 29m kubelet, seliicbl01526-k8sw Successfully pulled image "quay.io/k8scsi/csi-provisioner:v1.4.0" Normal Created 29m kubelet, seliicbl01526-k8sw Created container csi-provisioner Normal Started 29m kubelet, seliicbl01526-k8sw Started container csi-provisioner Warning Unhealthy 24m (x19 over 29m) kubelet, seliicbl01526-k8sw Liveness probe failed: Get http://10.1.248.1:9808/healthz: dial tcp 10.1.248.1:9808: connect: connection refused Warning BackOff 14m (x80 over 28m) kubelet, seliicbl01526-k8sw Back-off restarting failed container Warning BackOff 9m16s (x102 over 28m) kubelet, seliicbl01526-k8sw Back-off restarting failed container Warning BackOff 4m17s (x106 over 27m) kubelet, seliicbl01526-k8sw Back-off restarting failed container

kubectl get pods -A -o wide kube-system vsphere-csi-controller-76b8d7d97b-gz4j9 2/6 CrashLoopBackOff 45 32m 10.1.248.1 seliicbl01526-k8sw

curl http://10.1.248.1:9808 curl: (7) Failed to connect to 10.1.248.1 port 9808: Connection refused

ping 10.1.248.1 PING 10.1.248.1 (10.1.248.1) 56(84) bytes of data. 64 bytes from 10.1.248.1: icmp_seq=1 ttl=64 time=0.032 ms 64 bytes from 10.1.248.1: icmp_seq=2 ttl=64 time=0.021 ms ^C --- 10.1.248.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1006ms rtt min/avg/max/mdev = 0.021/0.026/0.032/0.007 ms