vmware-archive / kubernetes-archived

This repository is archived. Please file in-tree vSphere Cloud Provider issues at https://github.com/kubernetes/kubernetes/issues . CSI Driver for vSphere is available at https://github.com/kubernetes/cloud-provider-vsphere
Apache License 2.0
46 stars 31 forks source link

Kubelet unable to mount PV in to the pod - FailedMount #496

Closed s0komma closed 6 years ago

s0komma commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened: After creating PV and claiming the PV, we can see that the volume is cerated and attached to the worker node where is pod is spun up, we are able to see the disk mount on to the server but kubelet is unable to attach the disk to the pod

What you expected to happen: POD to come up and PV be attached

How to reproduce it (as minimally and precisely as possible):

  1. create a deployment with PV claim
  2. POD will not come up

Anything else we need to know?:

Environment:

s0komma commented 6 years ago

POD events


Events:
  Type     Reason                  Age                From                           Message
  ----     ------                  ----               ----                           -------
  Warning  FailedScheduling        19m (x2 over 19m)  default-scheduler              pod has unbound PersistentVolumeClaims (repeated 7 times)
  Normal   Scheduled               18m                default-scheduler              Successfully assigned nexus-test-5f656596b6-k7jzw to us06000c2-worker-001
  Normal   SuccessfulMountVolume   18m                kubelet, us06000c2-worker-001  MountVolume.SetUp succeeded for volume "default-token-sz9nh"
  Normal   SuccessfulAttachVolume  18m                attachdetach-controller        AttachVolume.Attach succeeded for volume "pvc-1f9090b0-8079-11e8-9e03-0050568aa007"
  Warning  FailedMount             8m                 kubelet, us06000c2-worker-001  MountVolume.WaitForAttach failed for volume "pvc-1f9090b0-8079-11e8-9e03-0050568aa007" : Could not find attached VMDK "[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-1f9090b0-8079-11e8-9e03-0050568aa007.vmdk". Timeout waiting for mount paths to be created.
  Warning  FailedMount             1m (x8 over 16m)   kubelet, us06000c2-worker-001  Unable to mount volumes for pod "nexus-test-5f656596b6-k7jzw_default(1f999f48-8079-11e8-9e03-0050568aa007)": timeout expired waiting for volumes to attach or mount for pod "default"/"nexus-test-5f656596b6-k7jzw". list of unmounted volumes=[nexus-test-pd]. list of unattached volumes=[nexus-test-pd default-token-sz9nh]
s0komma commented 6 years ago

work node lsblk


NAME                            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdd                               8:48   0    2G  0 disk
sr0                              11:0    1 1024M  0 rom```
s0komma commented 6 years ago

pvc details


NAME               STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nexus-test-claim   Bound     pvc-1f9090b0-8079-11e8-9e03-0050568aa007   2Gi        RWO            vsphere        20m
root@us06000c2-master-01:/home/app# kubectl describe  pvc nexus-test-claim
Name:          nexus-test-claim
Namespace:     default
StorageClass:  vsphere
Status:        Bound
Volume:        pvc-1f9090b0-8079-11e8-9e03-0050568aa007
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"nexus-test-claim","namespace":"default"},"spec":{"accessModes":[...
               pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/vsphere-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      2Gi
Access Modes:  RWO
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  20m   persistentvolume-controller  Successfully provisioned volume pvc-1f9090b0-8079-11e8-9e03-0050568aa007 using kubernetes.io/vsphere-volume
s0komma commented 6 years ago

worker node kubelet logs keep showing below things in a loop


Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.673866    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.673894    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.673899    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.707966    1876 attacher.go:163] Checking VMDK "[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-1f9090b0-8079-11e8-9e03-0050568aa007.vmdk" is attached
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.774127    1876 reconciler.go:237] Starting operationExecutor.MountVolume for volume "pvc-1f9090b0-8079-11e8-9e03-0050568aa007" (UniqueName: "kubernetes.io/vsphere-volume/[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-1f9090b0-8079-11e8-9e03-0050568aa007.vmdk") pod "nexus-test-5f656596b6-k7jzw" (UID: "1f999f48-8079-11e8-9e03-0050568aa007")
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.774195    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.774213    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.774218    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.874520    1876 reconciler.go:237] Starting operationExecutor.MountVolume for volume "pvc-1f9090b0-8079-11e8-9e03-0050568aa007" (UniqueName: "kubernetes.io/vsphere-volume/[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-1f9090b0-8079-11e8-9e03-0050568aa007.vmdk") pod "nexus-test-5f656596b6-k7jzw" (UID: "1f999f48-8079-11e8-9e03-0050568aa007")
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.874597    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.874626    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.874631    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.974877    1876 reconciler.go:237] Starting operationExecutor.MountVolume for volume "pvc-1f9090b0-8079-11e8-9e03-0050568aa007" (UniqueName: "kubernetes.io/vsphere-volume/[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-1f9090b0-8079-11e8-9e03-0050568aa007.vmdk") pod "nexus-test-5f656596b6-k7jzw" (UID: "1f999f48-8079-11e8-9e03-0050568aa007")
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.974961    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.974981    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:00 us06000c2-worker-001 kubelet[1876]: I0705 17:56:00.974996    1876 volume_host.go:219] using default mounter/exec for kubernetes.io/vsphere-volume
Jul 05 17:56:01 us06000c2-worker-001 kubelet[1876]: I0705 17:56:01.075214    1876 reconciler.go:237] Starting operationExecutor.MountVolume for volume "pvc-1f9090b0-8079-11e8-9e03-0050568aa007" (UniqueName: "kubernetes.io/vsphere-volume/[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-1f9090b0-8079-11e8-9e03-0050568aa007.vmdk") pod "nexus-test-5f656596b6-k7jzw" (UID: "1f999f48-8079-11e8-9e03-0050568aa007")
s0komma commented 6 years ago

worker-kubelet-log.txt

s0komma commented 6 years ago

controller-log.zip

s0komma commented 6 years ago

kubelet config on worker node


### Upstream source https://github.com/kubernetes/release/blob/master/debian/xenial/kubeadm/channel/stable/etc/systemd/system/kubelet.service.d/
### All upstream values should be present in this file

# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=11"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0 --node-ip=XX.xx.XX.xxx"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=us06000c2-worker-001"

KUBELET_ARGS="--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--feature-gates=VolumeScheduling=true \
--cloud-provider=vsphere \
--kubeconfig=/etc/kubernetes/kubelet.conf \
--authentication-token-webhook \
--authorization-mode=Webhook \
--client-ca-file=/etc/kubernetes/ssl/ca.crt \
--pod-manifest-path=/etc/kubernetes/manifests \
--cadvisor-port=0 \
--pod-infra-container-image=gcr/google_containers/pause-amd64:3.0 \
--node-status-update-frequency=10s \
--cgroup-driver=cgroupfs \
--max-pods=110 \
--docker-disable-shared-pid=True \
--anonymous-auth=false \
--read-only-port=0 \
--fail-swap-on=True \
 --cluster-dns=XXX.XX.xx.XX --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kube-reserved cpu=100m,memory=256M "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER=""

PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
s0komma commented 6 years ago

Detailed information of the pod and PV


NAME                           READY     STATUS              RESTARTS   AGE       IP              NODE
redis01-769db9769b-xlq7g       0/1       ContainerCreating   0          38m       <none>          us06000c2-worker-001
root@us06000c2-master-01:/home/app# kubectl describe po redis01-769db9769b-xlq7g
Name:           redis01-769db9769b-xlq7g
Namespace:      default
Node:           us06000c2-worker-001/xx.XX.xx.XX
Start Time:     Fri, 06 Jul 2018 03:56:22 +0000
Labels:         app=redis01
                pod-template-hash=3258653256
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  ReplicaSet/redis01-769db9769b
Containers:
  redis01:
    Container ID:
    Image:          docker.io/redis:latest
    Image ID:
    Port:           6379/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       tcp-socket :6379 delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /data from redis-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-sz9nh (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  redis-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  redis-vol01
    ReadOnly:   false
  default-token-sz9nh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-sz9nh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason                  Age                From                           Message
  ----     ------                  ----               ----                           -------
  Warning  FailedScheduling        38m (x2 over 38m)  default-scheduler              pod has unbound PersistentVolumeClaims (repeated 7 times)
  Normal   Scheduled               38m                default-scheduler              Successfully assigned redis01-769db9769b-xlq7g to us06000c2-worker-001
  Normal   SuccessfulMountVolume   38m                kubelet, us06000c2-worker-001  MountVolume.SetUp succeeded for volume "default-token-sz9nh"
  Normal   SuccessfulAttachVolume  38m                attachdetach-controller        AttachVolume.Attach succeeded for volume "pvc-8b850f45-80d0-11e8-9e03-0050568aa007"
  Warning  FailedMount             8m (x3 over 28m)   kubelet, us06000c2-worker-001  MountVolume.WaitForAttach failed for volume "pvc-8b850f45-80d0-11e8-9e03-0050568aa007" : Could not find attached VMDK "[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-8b850f45-80d0-11e8-9e03-0050568aa007.vmdk". Timeout waiting for mount paths to be created.
  Warning  FailedMount             2m (x16 over 36m)  kubelet, us06000c2-worker-001  Unable to mount volumes for pod "redis01-769db9769b-xlq7g_default(8b8b67bc-80d0-11e8-9e03-0050568aa007)": timeout expired waiting for volumes to attach or mount for pod "default"/"redis01-769db9769b-xlq7g". list of unmounted volumes=[redis-data]. list of unattached volumes=[redis-data default-token-sz9nh]
  Normal   SuccessfulMountVolume   2m                 kubelet, us06000c2-worker-001  MountVolume.SetUp succeeded for volume "default-token-sz9nh"
  Warning  FailedMount             5s                 kubelet, us06000c2-worker-001  Unable to mount volumes for pod "redis01-769db9769b-xlq7g_default(8b8b67bc-80d0-11e8-9e03-0050568aa007)": timeout expired waiting for volumes to attach or mount for pod "default"/"redis01-769db9769b-xlq7g". list of unmounted volumes=[redis-data]. list of unattached volumes=[redis-data default-token-sz9nh]
root@us06000c2-master-01:/home/app# curl 127.0.0.1:8001/api/v1/persistentvolumes/pvc-8b850f45-80d0-11e8-9e03-0050568aa007
{
  "kind": "PersistentVolume",
  "apiVersion": "v1",
  "metadata": {
    "name": "pvc-8b850f45-80d0-11e8-9e03-0050568aa007",
    "selfLink": "/api/v1/persistentvolumes/pvc-8b850f45-80d0-11e8-9e03-0050568aa007",
    "uid": "8bc84ac3-80d0-11e8-9e03-0050568aa007",
    "resourceVersion": "496182",
    "creationTimestamp": "2018-07-06T03:56:22Z",
    "annotations": {
      "kubernetes.io/createdby": "vsphere-volume-dynamic-provisioner",
      "pv.kubernetes.io/bound-by-controller": "yes",
      "pv.kubernetes.io/provisioned-by": "kubernetes.io/vsphere-volume"
    },
    "finalizers": [
      "kubernetes.io/pv-protection"
    ]
  },
  "spec": {
    "capacity": {
      "storage": "2929688Ki"
    },
    "vsphereVolume": {
      "volumePath": "[vsanDatastore] 87b4355b-8294-714a-f4c1-8cdcd4008560/kubernetes-dynamic-pvc-8b850f45-80d0-11e8-9e03-0050568aa007.vmdk",
      "fsType": "ext3"
    },
    "accessModes": [
      "ReadWriteOnce"
    ],
    "claimRef": {
      "kind": "PersistentVolumeClaim",
      "namespace": "default",
      "name": "redis-vol01",
      "uid": "8b850f45-80d0-11e8-9e03-0050568aa007",
      "apiVersion": "v1",
      "resourceVersion": "496162"
    },
    "persistentVolumeReclaimPolicy": "Delete",
    "storageClassName": "vsphere"
  },
  "status": {
    "phase": "Bound"
  }
root@us06000c2-master-01:/home/app# curl 127.0.0.1:8001/api/v1/namespaces/default/events?fieldSelector=involvedObject.name%3Dredis-vol01%2CinvolvedObject.namespace%3Ddefault%2CinvolvedObject.kind%3DPersistentVolumeClaim%2CinvolvedObject.uid%3D8b850f45-80d0-11e8-9e03-0050568aa007
{
  "kind": "EventList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/namespaces/default/events",
    "resourceVersion": "501969"
  },
  "items": [
    {
      "metadata": {
        "name": "redis-vol01.153eab66fc0599d1",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/events/redis-vol01.153eab66fc0599d1",
        "uid": "8bc92236-80d0-11e8-9e03-0050568aa007",
        "resourceVersion": "496179",
        "creationTimestamp": "2018-07-06T03:56:22Z"
      },
      "involvedObject": {
        "kind": "PersistentVolumeClaim",
        "namespace": "default",
        "name": "redis-vol01",
        "uid": "8b850f45-80d0-11e8-9e03-0050568aa007",
        "apiVersion": "v1",
        "resourceVersion": "496162"
      },
      "reason": "ProvisioningSucceeded",
      "message": "Successfully provisioned volume pvc-8b850f45-80d0-11e8-9e03-0050568aa007 using kubernetes.io/vsphere-volume",
      "source": {
        "component": "persistentvolume-controller"
      },
      "firstTimestamp": "2018-07-06T03:56:22Z",
      "lastTimestamp": "2018-07-06T03:56:22Z",
      "count": 1,
      "type": "Normal",
      "eventTime": null,
      "reportingComponent": "",
      "reportingInstance": ""
    }
  ]
}root@us06000c2-master-01:/home/app#
s0komma commented 6 years ago

ok found the issue, disk UUID was not set properly