vmware-archive / kubernetes-archived

This repository is archived. Please file in-tree vSphere Cloud Provider issues at https://github.com/kubernetes/kubernetes/issues . CSI Driver for vSphere is available at https://github.com/kubernetes/cloud-provider-vsphere
Apache License 2.0
46 stars 31 forks source link

Unable to start kubelet after adding vsphere.conf file #501

Closed GajaHebbar closed 6 years ago

GajaHebbar commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I am trying configure/use vmware datastore to use it as volume(create static vmdk and/or create volume dynamically as per https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/policy-based-mgmt.html

and when I follow https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html

to configure user and vsphere.conf for k8s v1.10.4 (version 1.9 and above) I am not able to start the kubelet service and further no operation can be done like kubectl create,get pods,get nodes

What you expected to happen: after vsphere.conf setting kubelet should start and should be able to perform operation like create

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Linux barnda129.inblrlab.avaya.com 3.10.0-862.3.2.el7.x86_64

create vsphere.conf in /etc/kubernetes

disk.EnableUUID is set to true for both master and worker node

added

--cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf

in /etc/kubernetes/manifests/kube-controller-manager.yaml /etc/kubernetes/manifests/kube-apiserver.yaml

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf" (at location /etc/systemd/system/kubelet.service.d/10-kubeadm.conf )

in master node

in worker node

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"

at location /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Attached vspgere.conf vsphere.docx

Error Trace

Jul 31 12:36:11 barnda129 kubelet: E0731 12:36:11.688844 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list v1.Node: Get https://10.133.132.129:6443/api/v1/nodes?fieldSelector=metadata.name%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.686611 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list v1.Service: Get https://10.133.132.129:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.688701 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list v1.Pod: Get https://10.133.132.129:6443/api/v1/pods? vsphere.docx fieldSelector=spec.nodeName%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.689943 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list v1.Node: Get https://10.133.132.129:6443/api/v1/nodes?fieldSelector=metadata.name%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused

Please let me know what is missing here

embano1 commented 6 years ago

Can you please try to also deploy vsphere.conf on the workers and add --cloud-config= parameter? We run into the same issue and even though it's documented that the conf is not needed on the workers, it seems to break the kubelet.

GajaHebbar commented 6 years ago

I have done that before opening the issue here. That also didn't work.

I have mentioned it in the issue

please refer

in worker node

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"

embano1 commented 6 years ago

I have done that before opening the issue here. That also didn't work.

Sorry for being not clear. What I meant is to also pass the vsphere.conf as --cloud-config parameter to each kubelet. Looks like you don't do that currently:

in worker node Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"

However, looking at the logs it seems like communication between the API server and kubelet is blocked or API is not reachable. Is everything working as expected on the control plane?

GajaHebbar commented 6 years ago

Ok, that was not done. Will try that.

GajaHebbar commented 6 years ago
- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf

added above in /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifestskube-controller-manager.yaml /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

of master node

and in worker node

- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf

in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf then restarted systemctl daemon-reload followed by systemctl restart kubelet.service worker and master

which results in error

Aug 22 14:46:51 barnda129 kubelet: E0822 14:46:51.272118 9409 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.132.129:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused

If I remove - --cloud-provider=vsphere

Aug 22 15:29:25 barnda135 kubelet: I0822 15:29:25.112493 23148 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "f9ed41a6-a5f1-11e8-94ea-005056b3208e") Aug 22 15:29:25 barnda135 kubelet: E0822 15:29:25.117281 23148 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\"" failed. No retries permitted until 2018-08-22 15:29:57.117211852 +0530 IST m=+237.602265226 (durationBeforeRetry 32s). Error: "Volume not attached according to node status for volume \"pv0001\" (UniqueName: \"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\") pod \"pvpod\" (UID: \"f9ed41a6-a5f1-11e8-94ea-005056b3208e\") "

divyenpatel commented 6 years ago

@GajaHebbar Looks like API server is not getting started correctly after you are adding flags

- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf

Please checkout manifest file for API server, and make sure /etc/kubernetes/ is mounted into the API server pod.

For kubernetes cluster deployed using kubeadm, generally /etc/kubernetes is not accessible to system pods.

you may need to move vsphere.conf file in /etc/kubernetes/pki/ or other accessible directory. Please refer manifest files posted at - https://gist.github.com/divyenpatel/f5f23addca31b0a7da1647831539969f

neeraj23 commented 6 years ago

Hi @divyenpatel , I am working with @GajaHebbar on this, we tried the configuration as mentioned here: https://gist.github.com/divyenpatel/f5f23addca31b0a7da1647831539969f , but after creating the pod we are encountering this error "Invalid configuration for device '0'."

The logs are as follows:

Aug 24 18:55:29 barnda135 kubelet: I0824 18:55:29.670835 5815 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "e2b75b77-a7a0-11e8-9476-005056b3208e") Aug 24 18:55:29 barnda135 kubelet: E0824 18:55:29.675404 5815 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\"" failed. No retries permitted until 2018-08-24 18:57:31.67535422 +0530 IST m=+90113.195561718 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"pv0001\" (UniqueName: \"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\") pod \"pvpod\" (UID: \"e2b75b77-a7a0-11e8-9476-005056b3208e\") " Aug 24 18:55:30 barnda135 kubelet: E0824 18:55:30.545966 5815 kubelet.go:1640] Unable to mount volumes for pod "pvpod_default(e2b75b77-a7a0-11e8-9476-005056b3208e)": timeout expired waiting for volumes to attach or mount for pod "default"/"pvpod". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-rcb68]; skipping pod Aug 24 18:55:30 barnda135 kubelet: E0824 18:55:30.546053 5815 pod_workers.go:186] Error syncing pod e2b75b77-a7a0-11e8-9476-005056b3208e ("pvpod_default(e2b75b77-a7a0-11e8-9476-005056b3208e)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"pvpod". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-rcb68]

divyenpatel commented 6 years ago

@neeraj23 @GajaHebbar Have you set disk.enableUUID=1 flag on all your node VMs.

The disk UUID on the node VMs must be enabled: the disk.EnableUUID value must be set to True. This step is necessary so that the VMDK always presents a consistent UUID to the VM, thus allowing the disk to be mounted properly. For each of the virtual machine nodes that will be participating in the cluster, follow the steps below using govc.

Find Node VM Paths

   govc ls /datacenter/vm/<vm-folder-name>

Set disk.EnableUUID to true for all VMs.

govc vm.change -e="disk.enableUUID=1" -vm='VM Path'

Note: If Kubernetes Node VMs are created from template VM then disk.EnableUUID=1 can be set on the template VM. VMs cloned from this template, will automatically inherit this property.

divyenpatel commented 6 years ago

@neeraj23 @GajaHebbar Do you see PVC bound to PV? Are you using PVC in the Pod Spec? Can provide kubectl describe output for PV, PVC and Pod. We need to see the events section from kubectl describe output for failures.

neeraj23 commented 6 years ago

Hi @divyenpatel , The VMs already have disk.enableUUID=1 set. I have created the pv, pvc and pods using these three files vpshere-volume-pvcpod.yaml.txt vsphere-volume-pv.yaml.txt vsphere-volume-pvc.yaml.txt

The pvc and pv are shown to be in bound state. But I am not able to start a pod using the pv and pvc. The describe output for pv, pvc and pod are as follows. describe pod.txt describe pv.txt describe pvc.txt

neeraj23 commented 6 years ago

I tried to create a pod using vsphere volume in another setup using this yaml file

test-pod.yaml.txt

But I get the error saying "Invalid configuration for device '0.'" The output of kubectl describe pod is as follows.

describe pod.txt

divyenpatel commented 6 years ago

I see you have following volumePath

volumePath: "[/Bangalore/datastore/10.133.132.83_DS1] volume/test.vmdk"

In the above path Bangalore, and datastore are datastore folders? If not, you have incorrect volumePath.

It should be as shown below.

If datastore sharedVmfs-0 is under datastore folder DatastoreFolder. Here kubevols is the directory in the datastore in which vmdk is present.

volumePath: "[DatastoreFolder/sharedVmfs-0] kubevols/test.vmdk"

If datastore sharedVmfs-0 is under root / folder.

volumePath: "[sharedVmfs-0] kubevols/test.vmdk"

We have updated instructions for configuring vSphere Cloud Provider recently. Can you please follow and make sure vsphere.conf is correctly configured. - https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html

GajaHebbar commented 6 years ago

@divyenpatel Looked in to the system and there were issues with datastore which was not accessible from the VM which was running kubernetes cluster, after re-configuring that and with new vsphere.conf file provided https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html. This issue is fixed