kubernetes / cloud-provider-vsphere

Kubernetes Cloud Provider for vSphere https://cloud-provider-vsphere.sigs.k8s.io
Apache License 2.0
242 stars 177 forks source link

csi-controller SIGSEGV #281

Closed olivierbeytrison closed 4 years ago

olivierbeytrison commented 4 years ago

/kind bug

What happened: We're trying to setup the VSphere CPI/CSI following this doc

The csi-controller daemonset is in a CrashLoopBackOff state. The liveness check fails, because the csi-controller container cannot start due to a SIGSEGV.

kubectl logs -n kube-system vsphere-csi-controller-0 vsphere-csi-controller 
I1206 07:39:54.069404       1 config.go:261] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
I1206 07:39:54.069647       1 config.go:206] Initializing vc server 160.98.220.50
I1206 07:39:54.069698       1 controller.go:67] Initializing CNS controller
I1206 07:39:54.069734       1 virtualcentermanager.go:63] Initializing defaultVirtualCenterManager...
I1206 07:39:54.069758       1 virtualcentermanager.go:65] Successfully initialized defaultVirtualCenterManager
I1206 07:39:54.069791       1 virtualcentermanager.go:107] Successfully registered VC "160.98.220.50"
I1206 07:39:54.069819       1 manager.go:60] Initializing volume.volumeManager...
I1206 07:39:54.069842       1 manager.go:64] volume.volumeManager initialized
time="2019-12-06T07:40:15Z" level=info msg="received signal; shutting down" signal=terminated
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x128 pc=0x867dc7]

goroutine 24 [running]:
google.golang.org/grpc.(*Server).GracefulStop(0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:1393 +0x37
github.com/rexray/gocsi.(*StoragePlugin).GracefulStop.func1()
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:333 +0x35
sync.(*Once).Do(0xc0002e080c, 0xc0003eeef8)
        /usr/local/go/src/sync/once.go:44 +0xb3
github.com/rexray/gocsi.(*StoragePlugin).GracefulStop(0xc0002e0780, 0x21183a0, 0xc0000ae010)
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:332 +0x56
github.com/rexray/gocsi.Run.func3()
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:121 +0x4e
github.com/rexray/gocsi.trapSignals.func1(0xc000437380, 0xc0004737a0, 0xc000473770)
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:502 +0x143
created by github.com/rexray/gocsi.trapSignals
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:487 +0x107

vsphere-csi-controller-0

kubectl describe pod -n kube-system vsphere-csi-controller-0 
Name:         vsphere-csi-controller-0
Namespace:    kube-system
Priority:     0
Node:         node1/160.98.236.80
Start Time:   Fri, 06 Dec 2019 07:31:50 +0000
Labels:       app=vsphere-csi-controller
              controller-revision-hash=vsphere-csi-controller-78bb4df5f7
              role=vsphere-csi
              statefulset.kubernetes.io/pod-name=vsphere-csi-controller-0
Annotations:  <none>
Status:       Running
IP:           10.245.0.5
IPs:
  IP:           10.245.0.5
Controlled By:  StatefulSet/vsphere-csi-controller
Containers:
  csi-attacher:
    Container ID:  docker://286d6b04f3cf300256681855bcd1f98903cb01d8c2da627b5952f1c642c34dae
    Image:         quay.io/k8scsi/csi-attacher:v1.1.1
    Image ID:      docker-pullable://quay.io/k8scsi/csi-attacher@sha256:e4db94969e1d463807162a1115192ed70d632a61fbeb3bdc97b40fe9ce78c831
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=4
      --timeout=300s
      --csi-address=$(ADDRESS)
    State:          Running
      Started:      Fri, 06 Dec 2019 07:31:51 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:  /csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  vsphere-csi-controller:
    Container ID:  docker://2b79e2f2a63ede8245da723f90c4f9b6e4cabdcb204d3843be01c1c3f1ec8bbf
    Image:         gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
    Image ID:      docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:fae6806f5423a0099cdf60cf53cff474b228ee4846a242d025e4833a66f91b3f
    Port:          9808/TCP
    Host Port:     0/TCP
    Args:
      --v=4
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 06 Dec 2019 07:36:47 +0000
      Finished:     Fri, 06 Dec 2019 07:37:10 +0000
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
    Environment:
      CSI_ENDPOINT:        unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      X_CSI_MODE:          controller
      VSPHERE_CSI_CONFIG:  /etc/cloud/csi-vsphere.conf
    Mounts:
      /etc/cloud from vsphere-config-volume (ro)
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  liveness-probe:
    Container ID:  docker://dd27a57a7e3d3d9350685c6505498237dc6fc101e8a7dcf6af35b0cd99ce7d92
    Image:         quay.io/k8scsi/livenessprobe:v1.1.0
    Image ID:      docker-pullable://quay.io/k8scsi/livenessprobe@sha256:dde617756e0f602adc566ab71fd885f1dad451ad3fb063ac991c95a2ff47aea5
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
    State:          Running
      Started:      Fri, 06 Dec 2019 07:31:53 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  vsphere-syncer:
    Container ID:  docker://cea6e98a429f7deb145ef885ddf3238a23d6eeb595e164107c3ddf75f3b9341a
    Image:         gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
    Image ID:      docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/syncer@sha256:fc80ec77a2ab4b58ddfa259a938f6d741933566011d56e5ffcc8680cc83538fe
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=2
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 06 Dec 2019 07:37:12 +0000
      Finished:     Fri, 06 Dec 2019 07:37:42 +0000
    Ready:          False
    Restart Count:  5
    Environment:
      FULL_SYNC_INTERVAL_MINUTES:  30
      VSPHERE_CSI_CONFIG:          /etc/cloud/csi-vsphere.conf
    Mounts:
      /etc/cloud from vsphere-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  csi-provisioner:
    Container ID:  docker://c3f788e2030f386d71252b2c63b50e1992f46eef2a4e0675cf856997d12dde2e
    Image:         quay.io/k8scsi/csi-provisioner:v1.2.2
    Image ID:      docker-pullable://quay.io/k8scsi/csi-provisioner@sha256:e3239de37c06d2bcd0e9e9648fe9a8b418d5caf9e89f243c649ff2394d3cbfef
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=4
      --timeout=300s
      --csi-address=$(ADDRESS)
      --feature-gates=Topology=true
      --strict-topology
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Message:      Lost connection to CSI driver, exiting
      Exit Code:    255
      Started:      Fri, 06 Dec 2019 07:36:47 +0000
      Finished:     Fri, 06 Dec 2019 07:37:08 +0000
    Ready:          False
    Restart Count:  5
    Environment:
      ADDRESS:  /csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  vsphere-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  vsphere-config-secret
    Optional:    false
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com
    HostPathType:  DirectoryOrCreate
  vsphere-csi-controller-token-h9bqc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  vsphere-csi-controller-token-h9bqc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m17s                  default-scheduler  Successfully assigned kube-system/vsphere-csi-controller-0 to node1
  Normal   Pulled     7m15s                  kubelet, node1     Container image "quay.io/k8scsi/csi-attacher:v1.1.1" already present on machine
  Normal   Created    7m15s                  kubelet, node1     Created container csi-attacher
  Normal   Started    7m15s                  kubelet, node1     Started container csi-attacher
  Normal   Pulled     7m13s                  kubelet, node1     Container image "quay.io/k8scsi/livenessprobe:v1.1.0" already present on machine
  Normal   Pulling    7m13s                  kubelet, node1     Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1"
  Normal   Created    7m13s                  kubelet, node1     Created container liveness-probe
  Normal   Started    7m13s                  kubelet, node1     Started container liveness-probe
  Normal   Started    7m12s                  kubelet, node1     Started container vsphere-syncer
  Normal   Pulled     7m12s                  kubelet, node1     Container image "quay.io/k8scsi/csi-provisioner:v1.2.2" already present on machine
  Normal   Pulled     7m12s                  kubelet, node1     Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1"
  Normal   Created    7m12s                  kubelet, node1     Created container vsphere-syncer
  Normal   Created    7m11s                  kubelet, node1     Created container csi-provisioner
  Normal   Started    7m11s                  kubelet, node1     Started container csi-provisioner
  Normal   Pulling    6m51s (x2 over 7m15s)  kubelet, node1     Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1"
  Normal   Killing    6m51s                  kubelet, node1     Container vsphere-csi-controller failed liveness probe, will be restarted
  Normal   Started    6m50s (x2 over 7m13s)  kubelet, node1     Started container vsphere-csi-controller
  Normal   Created    6m50s (x2 over 7m14s)  kubelet, node1     Created container vsphere-csi-controller
  Normal   Pulled     6m50s (x2 over 7m14s)  kubelet, node1     Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1"
  Warning  Unhealthy  2m6s (x22 over 7m1s)   kubelet, node1     Liveness probe failed: Get http://10.245.0.5:9808/healthz: dial tcp 10.245.0.5:9808: connect: connection refused

What you expected to happen: The daemonset should start successfully How to reproduce it (as minimally and precisely as possible): follow the documentation in the link up-there

Anything else we need to know?:

Environment:

olivierbeytrison commented 4 years ago

might not be the right place to open this issue. For x-reference : https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/106

olivierbeytrison commented 4 years ago

solved after re-installing verything.