Closed GajaHebbar closed 6 years ago
Can you please try to also deploy vsphere.conf on the workers and add --cloud-config=
parameter? We run into the same issue and even though it's documented that the conf is not needed on the workers, it seems to break the kubelet.
I have done that before opening the issue here. That also didn't work.
I have mentioned it in the issue
please refer
in worker node
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"
I have done that before opening the issue here. That also didn't work.
Sorry for being not clear. What I meant is to also pass the vsphere.conf as --cloud-config
parameter to each kubelet. Looks like you don't do that currently:
in worker node Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"
However, looking at the logs it seems like communication between the API server and kubelet is blocked or API is not reachable. Is everything working as expected on the control plane?
Ok, that was not done. Will try that.
- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf
added above in /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifestskube-controller-manager.yaml /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
of master node
and in worker node
- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf
in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf then restarted systemctl daemon-reload followed by systemctl restart kubelet.service worker and master
which results in error
Aug 22 14:46:51 barnda129 kubelet: E0822 14:46:51.272118 9409 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.132.129:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused
If I remove - --cloud-provider=vsphere
Aug 22 15:29:25 barnda135 kubelet: I0822 15:29:25.112493 23148 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "f9ed41a6-a5f1-11e8-94ea-005056b3208e") Aug 22 15:29:25 barnda135 kubelet: E0822 15:29:25.117281 23148 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\"" failed. No retries permitted until 2018-08-22 15:29:57.117211852 +0530 IST m=+237.602265226 (durationBeforeRetry 32s). Error: "Volume not attached according to node status for volume \"pv0001\" (UniqueName: \"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\") pod \"pvpod\" (UID: \"f9ed41a6-a5f1-11e8-94ea-005056b3208e\") "
@GajaHebbar Looks like API server is not getting started correctly after you are adding flags
- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf
Please checkout manifest file for API server, and make sure /etc/kubernetes/
is mounted into the API server pod.
For kubernetes cluster deployed using kubeadm, generally /etc/kubernetes
is not accessible to system pods.
you may need to move vsphere.conf file in /etc/kubernetes/pki/
or other accessible directory.
Please refer manifest files posted at - https://gist.github.com/divyenpatel/f5f23addca31b0a7da1647831539969f
Hi @divyenpatel , I am working with @GajaHebbar on this, we tried the configuration as mentioned here: https://gist.github.com/divyenpatel/f5f23addca31b0a7da1647831539969f , but after creating the pod we are encountering this error "Invalid configuration for device '0'."
The logs are as follows:
Aug 24 18:55:29 barnda135 kubelet: I0824 18:55:29.670835 5815 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pv0001" (UniqueName: "kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test") pod "pvpod" (UID: "e2b75b77-a7a0-11e8-9476-005056b3208e") Aug 24 18:55:29 barnda135 kubelet: E0824 18:55:29.675404 5815 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\"" failed. No retries permitted until 2018-08-24 18:57:31.67535422 +0530 IST m=+90113.195561718 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"pv0001\" (UniqueName: \"kubernetes.io/vsphere-volume/[10.133.132.83_DS1] volume/test\") pod \"pvpod\" (UID: \"e2b75b77-a7a0-11e8-9476-005056b3208e\") " Aug 24 18:55:30 barnda135 kubelet: E0824 18:55:30.545966 5815 kubelet.go:1640] Unable to mount volumes for pod "pvpod_default(e2b75b77-a7a0-11e8-9476-005056b3208e)": timeout expired waiting for volumes to attach or mount for pod "default"/"pvpod". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-rcb68]; skipping pod Aug 24 18:55:30 barnda135 kubelet: E0824 18:55:30.546053 5815 pod_workers.go:186] Error syncing pod e2b75b77-a7a0-11e8-9476-005056b3208e ("pvpod_default(e2b75b77-a7a0-11e8-9476-005056b3208e)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"pvpod". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-rcb68]
@neeraj23 @GajaHebbar Have you set disk.enableUUID=1
flag on all your node VMs.
The disk UUID on the node VMs must be enabled: the disk.EnableUUID
value must be set to True. This step is necessary so that the VMDK always presents a consistent UUID to the VM, thus allowing the disk to be mounted properly. For each of the virtual machine nodes that will be participating in the cluster, follow the steps below using govc.
Find Node VM Paths
govc ls /datacenter/vm/<vm-folder-name>
Set disk.EnableUUID to true for all VMs.
govc vm.change -e="disk.enableUUID=1" -vm='VM Path'
Note: If Kubernetes Node VMs are created from template VM then disk.EnableUUID=1 can be set on the template VM. VMs cloned from this template, will automatically inherit this property.
@neeraj23 @GajaHebbar Do you see PVC bound to PV? Are you using PVC in the Pod Spec?
Can provide kubectl describe output for PV, PVC and Pod. We need to see the events section from kubectl describe
output for failures.
Hi @divyenpatel , The VMs already have disk.enableUUID=1 set. I have created the pv, pvc and pods using these three files vpshere-volume-pvcpod.yaml.txt vsphere-volume-pv.yaml.txt vsphere-volume-pvc.yaml.txt
The pvc and pv are shown to be in bound state. But I am not able to start a pod using the pv and pvc. The describe output for pv, pvc and pod are as follows. describe pod.txt describe pv.txt describe pvc.txt
I tried to create a pod using vsphere volume in another setup using this yaml file
But I get the error saying "Invalid configuration for device '0.'" The output of kubectl describe pod is as follows.
I see you have following volumePath
volumePath: "[/Bangalore/datastore/10.133.132.83_DS1] volume/test.vmdk"
In the above path Bangalore
, and datastore
are datastore folders? If not, you have incorrect volumePath.
It should be as shown below.
If datastore sharedVmfs-0
is under datastore folder DatastoreFolder
.
Here kubevols
is the directory in the datastore in which vmdk is present.
volumePath: "[DatastoreFolder/sharedVmfs-0] kubevols/test.vmdk"
If datastore sharedVmfs-0
is under root /
folder.
volumePath: "[sharedVmfs-0] kubevols/test.vmdk"
We have updated instructions for configuring vSphere Cloud Provider recently. Can you please follow and make sure vsphere.conf is correctly configured. - https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html
@divyenpatel Looked in to the system and there were issues with datastore which was not accessible from the VM which was running kubernetes cluster, after re-configuring that and with new vsphere.conf file provided https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html. This issue is fixed
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I am trying configure/use vmware datastore to use it as volume(create static vmdk and/or create volume dynamically as per https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/policy-based-mgmt.html
and when I follow https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html
to configure user and vsphere.conf for k8s v1.10.4 (version 1.9 and above) I am not able to start the kubelet service and further no operation can be done like kubectl create,get pods,get nodes
What you expected to happen: after vsphere.conf setting kubelet should start and should be able to perform operation like create
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: vsphere V 6.5
OS (e.g. from /etc/os-release): NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7"
Kernel (e.g.
uname -a
):Linux barnda129.inblrlab.avaya.com 3.10.0-862.3.2.el7.x86_64
create vsphere.conf in /etc/kubernetes
disk.EnableUUID is set to true for both master and worker node
added
--cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf
in /etc/kubernetes/manifests/kube-controller-manager.yaml /etc/kubernetes/manifests/kube-apiserver.yaml
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf" (at location /etc/systemd/system/kubelet.service.d/10-kubeadm.conf )
in master node
in worker node
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=vsphere"
at location /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Attached vspgere.conf vsphere.docx
Error Trace
Jul 31 12:36:11 barnda129 kubelet: E0731 12:36:11.688844 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list v1.Node: Get https://10.133.132.129:6443/api/v1/nodes?fieldSelector=metadata.name%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.686611 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list v1.Service: Get https://10.133.132.129:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.688701 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list v1.Pod: Get https://10.133.132.129:6443/api/v1/pods? vsphere.docx fieldSelector=spec.nodeName%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused Jul 31 12:36:12 barnda129 kubelet: E0731 12:36:12.689943 24258 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list v1.Node: Get https://10.133.132.129:6443/api/v1/nodes?fieldSelector=metadata.name%3Dbarnda129.inblrlab.avaya.com&limit=500&resourceVersion=0: dial tcp 10.133.132.129:6443: getsockopt: connection refused
Please let me know what is missing here