What happened:
We're trying to setup the VSphere CPI/CSI following this doc
The csi-controller daemonset is in a CrashLoopBackOff state. The liveness check fails, because the csi-controller container cannot start due to a SIGSEGV.
kubectl describe pod -n kube-system vsphere-csi-controller-0
Name: vsphere-csi-controller-0
Namespace: kube-system
Priority: 0
Node: node1/160.98.236.80
Start Time: Fri, 06 Dec 2019 07:31:50 +0000
Labels: app=vsphere-csi-controller
controller-revision-hash=vsphere-csi-controller-78bb4df5f7
role=vsphere-csi
statefulset.kubernetes.io/pod-name=vsphere-csi-controller-0
Annotations: <none>
Status: Running
IP: 10.245.0.5
IPs:
IP: 10.245.0.5
Controlled By: StatefulSet/vsphere-csi-controller
Containers:
csi-attacher:
Container ID: docker://286d6b04f3cf300256681855bcd1f98903cb01d8c2da627b5952f1c642c34dae
Image: quay.io/k8scsi/csi-attacher:v1.1.1
Image ID: docker-pullable://quay.io/k8scsi/csi-attacher@sha256:e4db94969e1d463807162a1115192ed70d632a61fbeb3bdc97b40fe9ce78c831
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
State: Running
Started: Fri, 06 Dec 2019 07:31:51 +0000
Ready: True
Restart Count: 0
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
vsphere-csi-controller:
Container ID: docker://2b79e2f2a63ede8245da723f90c4f9b6e4cabdcb204d3843be01c1c3f1ec8bbf
Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
Image ID: docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:fae6806f5423a0099cdf60cf53cff474b228ee4846a242d025e4833a66f91b3f
Port: 9808/TCP
Host Port: 0/TCP
Args:
--v=4
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 06 Dec 2019 07:36:47 +0000
Finished: Fri, 06 Dec 2019 07:37:10 +0000
Ready: False
Restart Count: 7
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
X_CSI_MODE: controller
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
Mounts:
/etc/cloud from vsphere-config-volume (ro)
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
liveness-probe:
Container ID: docker://dd27a57a7e3d3d9350685c6505498237dc6fc101e8a7dcf6af35b0cd99ce7d92
Image: quay.io/k8scsi/livenessprobe:v1.1.0
Image ID: docker-pullable://quay.io/k8scsi/livenessprobe@sha256:dde617756e0f602adc566ab71fd885f1dad451ad3fb063ac991c95a2ff47aea5
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
State: Running
Started: Fri, 06 Dec 2019 07:31:53 +0000
Ready: True
Restart Count: 0
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
vsphere-syncer:
Container ID: docker://cea6e98a429f7deb145ef885ddf3238a23d6eeb595e164107c3ddf75f3b9341a
Image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
Image ID: docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/syncer@sha256:fc80ec77a2ab4b58ddfa259a938f6d741933566011d56e5ffcc8680cc83538fe
Port: <none>
Host Port: <none>
Args:
--v=2
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 06 Dec 2019 07:37:12 +0000
Finished: Fri, 06 Dec 2019 07:37:42 +0000
Ready: False
Restart Count: 5
Environment:
FULL_SYNC_INTERVAL_MINUTES: 30
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
Mounts:
/etc/cloud from vsphere-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
csi-provisioner:
Container ID: docker://c3f788e2030f386d71252b2c63b50e1992f46eef2a4e0675cf856997d12dde2e
Image: quay.io/k8scsi/csi-provisioner:v1.2.2
Image ID: docker-pullable://quay.io/k8scsi/csi-provisioner@sha256:e3239de37c06d2bcd0e9e9648fe9a8b418d5caf9e89f243c649ff2394d3cbfef
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--feature-gates=Topology=true
--strict-topology
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: Lost connection to CSI driver, exiting
Exit Code: 255
Started: Fri, 06 Dec 2019 07:36:47 +0000
Finished: Fri, 06 Dec 2019 07:37:08 +0000
Ready: False
Restart Count: 5
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
vsphere-config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: vsphere-config-secret
Optional: false
socket-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com
HostPathType: DirectoryOrCreate
vsphere-csi-controller-token-h9bqc:
Type: Secret (a volume populated by a Secret)
SecretName: vsphere-csi-controller-token-h9bqc
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m17s default-scheduler Successfully assigned kube-system/vsphere-csi-controller-0 to node1
Normal Pulled 7m15s kubelet, node1 Container image "quay.io/k8scsi/csi-attacher:v1.1.1" already present on machine
Normal Created 7m15s kubelet, node1 Created container csi-attacher
Normal Started 7m15s kubelet, node1 Started container csi-attacher
Normal Pulled 7m13s kubelet, node1 Container image "quay.io/k8scsi/livenessprobe:v1.1.0" already present on machine
Normal Pulling 7m13s kubelet, node1 Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1"
Normal Created 7m13s kubelet, node1 Created container liveness-probe
Normal Started 7m13s kubelet, node1 Started container liveness-probe
Normal Started 7m12s kubelet, node1 Started container vsphere-syncer
Normal Pulled 7m12s kubelet, node1 Container image "quay.io/k8scsi/csi-provisioner:v1.2.2" already present on machine
Normal Pulled 7m12s kubelet, node1 Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1"
Normal Created 7m12s kubelet, node1 Created container vsphere-syncer
Normal Created 7m11s kubelet, node1 Created container csi-provisioner
Normal Started 7m11s kubelet, node1 Started container csi-provisioner
Normal Pulling 6m51s (x2 over 7m15s) kubelet, node1 Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1"
Normal Killing 6m51s kubelet, node1 Container vsphere-csi-controller failed liveness probe, will be restarted
Normal Started 6m50s (x2 over 7m13s) kubelet, node1 Started container vsphere-csi-controller
Normal Created 6m50s (x2 over 7m14s) kubelet, node1 Created container vsphere-csi-controller
Normal Pulled 6m50s (x2 over 7m14s) kubelet, node1 Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1"
Warning Unhealthy 2m6s (x22 over 7m1s) kubelet, node1 Liveness probe failed: Get http://10.245.0.5:9808/healthz: dial tcp 10.245.0.5:9808: connect: connection refused
What you expected to happen:
The daemonset should start successfully
How to reproduce it (as minimally and precisely as possible):
follow the documentation in the link up-there
Anything else we need to know?:
Environment:
vsphere-cloud-controller-manager version: 1.0.1
OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS, Kubernetes v1.16.3
/kind bug
What happened: We're trying to setup the VSphere CPI/CSI following this doc
The csi-controller daemonset is in a CrashLoopBackOff state. The liveness check fails, because the csi-controller container cannot start due to a SIGSEGV.
vsphere-csi-controller-0
What you expected to happen: The daemonset should start successfully How to reproduce it (as minimally and precisely as possible): follow the documentation in the link up-there
Anything else we need to know?:
Environment:
uname -a
): 4.15.0-72-generic