kubernetes-sigs / cluster-api-provider-vsphere

Apache License 2.0
372 stars 295 forks source link

Can not create cluster using CAPV lastest version V.0.3.XX #435

Closed andrefelixbr closed 5 years ago

andrefelixbr commented 5 years ago

/kind bug

DISCLAIMER: The issue is when using the version v0.3.0-65-g14293965. The latest version for this ticket creation date. It's running fine on version v.0.3.0.

What steps did you take and what happened:

I'm following the get started on CAPV https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/master/docs/getting_started.md. I'm blocked on "Using clusterctl".

I executed the command bellow on out folder :

clusterctl create cluster --provider vsphere --bootstrap-type kind -c cluster.yaml -m machines.yaml -p provider-components.yaml --addon-components addons.yaml -v 10


I0712 16:52:38.123922   23945 round_trippers.go:438] GET https://127.0.0.1:44439/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/capv-mgmt-example-controlplane-1 200 OK in 4 milliseconds
I0712 16:52:38.124003   23945 round_trippers.go:444] Response Headers:
I0712 16:52:38.124043   23945 round_trippers.go:447]     Content-Type: application/json
I0712 16:52:38.124068   23945 round_trippers.go:447]     Content-Length: 934
I0712 16:52:38.124088   23945 round_trippers.go:447]     Date: Fri, 12 Jul 2019 15:52:38 GMT
I0712 16:52:38.124149   23945 request.go:942] Response Body: {"apiVersion":"cluster.k8s.io/v1alpha1","kind":"Machine","metadata":{"creationTimestamp":"2019-07-12T15:36:28Z","generation":1,"labels":{"cluster.k8s.io/cluster-name":"capv-mgmt-example"},"name":"capv-mgmt-example-controlplane-1","namespace":"default","resourceVersion":"305","selfLink":"/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/capv-mgmt-example-controlplane-1","uid":"8306d5e4-63e7-4b80-b563-042240193f6c"},"spec":{"metadata":{"creationTimestamp":null},"providerSpec":{"value":{"apiVersion":"vsphere.cluster.k8s.io/v1alpha1","datacenter":"DATA_CENTE0100","datastore":"locadatastore","diskGiB":50,"folder":"Workloads","kind":"VsphereMachineProviderSpec","memoryMiB":2048,"network":{"devices":[{"dhcp4":true,"dhcp6":false,"networkName":"Lg1ag1ccnlab20ash01|vlan1222|vlan1222"}]},"numCPUs":2,"resourcePool":"/DATA_CENTE0100/host/Cluster1/Resources/ESX Agents/Resource_pool_innovation_01","template":"ubuntu-1804-kube-13.6"}},"versions":{"controlPlane":"1.13.6","kubelet":"1.13.6"}}}
I0712 17:06:28.123538   23945 clusterclient.go:996] Waiting for Machine capv-mgmt-example-controlplane-1 to become ready...
I0712 17:06:28.123659   23945 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: clusterctl/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://127.0.0.1:44439/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/capv-mgmt-example-controlplane-1'
......
I0712 17:06:28.123174   23945 round_trippers.go:438] GET https://127.0.0.1:44439/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/capv-mgmt-example-controlplane-1 200 OK in 3 milliseconds
I0712 17:06:28.123223   23945 round_trippers.go:444] Response Headers:
I0712 17:06:28.123257   23945 round_trippers.go:447]     Date: Fri, 12 Jul 2019 16:06:28 GMT
I0712 17:06:28.123273   23945 round_trippers.go:447]     Content-Type: application/json
I0712 17:06:28.123285   23945 round_trippers.go:447]     Content-Length: 934
I0712 17:06:28.123340   23945 request.go:942] Response Body: {"apiVersion":"cluster.k8s.io/v1alpha1","kind":"Machine","metadata":{"creationTimestamp":"2019-07-12T15:36:28Z","generation":1,"labels":{"cluster.k8s.io/cluster-name":"capv-mgmt-example"},"name":"capv-mgmt-example-controlplane-1","namespace":"default","resourceVersion":"305","selfLink":"/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/capv-mgmt-example-controlplane-1","uid":"8306d5e4-63e7-4b80-b563-042240193f6c"},"spec":{"metadata":{"creationTimestamp":null},"providerSpec":{"value":{"apiVersion":"vsphere.cluster.k8s.io/v1alpha1","datacenter":"DATA_CENTE0100","datastore":"locadatastore","diskGiB":50,"folder":"Workloads","kind":"VsphereMachineProviderSpec","memoryMiB":2048,"network":{"devices":[{"dhcp4":true,"dhcp6":false,"networkName":"Lg1ag1ccnlab20ash01|vlan1222|vlan1222"}]},"numCPUs":2,"resourcePool":"/DATA_CENTE0100/host/Cluster1/Resources/ESX Agents/Resource_pool_innovation_01","template":"ubuntu-1804-kube-13.6"}},"versions":{"controlPlane":"1.13.6","kubelet":"1.13.6"}}}
I0712 17:06:28.123538   23945 clusterclient.go:996] Waiting for Machine capv-mgmt-example-controlplane-1 to become ready...
I0712 17:06:28.123659   23945 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: clusterctl/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://127.0.0.1:44439/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/capv-mgmt-example-controlplane-1'
I0712 17:06:28.127961   23945 createbootstrapcluster.go:36] Cleaning up bootstrap cluster.
I0712 17:06:28.127976   23945 kind.go:69] Running: kind [delete cluster --name=clusterapi]
I0712 17:06:29.069055   23945 kind.go:72] Ran: kind [delete cluster --name=clusterapi] Output: Deleting cluster "clusterapi" ...
F0712 17:06:29.069075   23945 create_cluster.go:61] unable to create control plane machine: timed out waiting for the condition

No cluster created nor clusterapi working properly. After 30 minutes I get "unable to create control plane machine: timed out waiting for the condition". Checking the POD status we can see that POD vsphere-provider-controller-manager-0 can not start.

Command: kubectl get pods --all-namespaces


NAMESPACE                 NAME                                               READY   STATUS             RESTARTS   AGE
cluster-api-system        cluster-api-controller-manager-0                   1/1     Running            0          3m4s
kube-system               coredns-5c98db65d4-d6grv                           1/1     Running            0          3m4s
kube-system               coredns-5c98db65d4-gxmww                           1/1     Running            0          3m4s
kube-system               etcd-clusterapi-control-plane                      1/1     Running            0          2m24s
kube-system               kindnet-mbcsk                                      1/1     Running            0          3m4s
kube-system               kube-apiserver-clusterapi-control-plane            1/1     Running            0          2m9s
kube-system               kube-controller-manager-clusterapi-control-plane   1/1     Running            0          2m22s
kube-system               kube-proxy-h6phk                                   1/1     Running            0          3m4s
kube-system               kube-scheduler-clusterapi-control-plane            1/1     Running            0          2m16s
vsphere-provider-system   vsphere-provider-controller-manager-0              0/1     CrashLoopBackOff   4          3m4s

Executing kubectl logs vsphere-provider-controller-manager-0 -n vsphere-provider-system There is no logs on output.

Finally when we describe the POD: kubectl describe pods vsphere-provider-controller-manager-0 -n vsphere-provider-system we can see this error, What we suspect is the root cause of the problem:

Message: failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory": unknown


Name:           vsphere-provider-controller-manager-0
Namespace:      vsphere-provider-system
Priority:       0
Node:           clusterapi-control-plane/172.17.0.2
Start Time:     Tue, 16 Jul 2019 09:04:17 +0100
Labels:         control-plane=controller-manager
                controller-revision-hash=vsphere-provider-controller-manager-55984b45df
                controller-tools.k8s.io=1.0
                statefulset.kubernetes.io/pod-name=vsphere-provider-controller-manager-0
Annotations:    <none>
Status:         Running
IP:             10.244.0.5
Controlled By:  StatefulSet/vsphere-provider-controller-manager
Containers:
  manager:
    Container ID:  containerd://af7063ab6135a6faced3b821f7c71d149b82a274c20e550241a6a95621840b46
    Image:         gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0
    Image ID:      gcr.io/cnx-cluster-api/vsphere-cluster-api-provider@sha256:79818f8f818a7e6ac0341aec9f218d971946ff2d140ba2477ffd6501213e73cb
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --logtostderr
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       StartError
      Message:      failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory": unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 01:00:00 +0100
      Finished:     Tue, 16 Jul 2019 09:07:54 +0100
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:     400m
      memory:  500Mi
    Requests:
      cpu:     200m
      memory:  200Mi
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /etc/kubernetes from config (rw)
      /etc/ssl/certs from certs (rw)
      /tmp/cluster-api/machines from machines-stage (rw)
      /usr/bin/kubeadm from kubeadm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r7jkv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes
    HostPathType:  
  certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  
  machines-stage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kubeadm:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/bin/kubeadm
    HostPathType:  
  default-token-r7jkv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-r7jkv
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.alpha.kubernetes.io/notReady:NoExecute
                 node.alpha.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From                               Message
  ----     ------            ----                   ----                               -------
  Warning  FailedScheduling  4m39s (x2 over 4m50s)  default-scheduler                  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         4m38s                  default-scheduler                  Successfully assigned vsphere-provider-system/vsphere-provider-controller-manager-0 to clusterapi-control-plane
  Normal   Pulling           4m37s                  kubelet, clusterapi-control-plane  Pulling image "gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0"
  Normal   Pulled            4m4s                   kubelet, clusterapi-control-plane  Successfully pulled image "gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0"
  Normal   Created           2m31s (x5 over 4m3s)   kubelet, clusterapi-control-plane  Created container manager
  Warning  Failed            2m31s (x5 over 4m3s)   kubelet, clusterapi-control-plane  Error: failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/manager\": stat /manager: no such file or directory": unknown
  Normal   Pulled            2m31s (x4 over 4m3s)   kubelet, clusterapi-control-plane  Container image "gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0" already present on machine
  Warning  BackOff           2m30s (x9 over 4m2s)   kubelet, clusterapi-control-plane  Back-off restarting failed container

What did you expect to happen:

The cluster is created and running on vSphere.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

NAME="Ubuntu" VERSION="18.04.2 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.2 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

figo commented 5 years ago

@andrefelixbr could you check the provider-component.yaml, is it still point to /root/manager? please run generate-yaml.sh to update the yamls since you updated the capv version.

andrefelixbr commented 5 years ago

@figo there is no info on provider-component.yaml pointing to "/root/manager". See my provider-components.yaml attached. provider-components.yaml.txt

Actually, I didn't update my capv version. I executed with the latest version first as it didn't work I cloned version v0.3.0 separately and test it.

They are in different folders to be able to test both easily:

figo commented 5 years ago

@andrefelixbr my previous comment is wrong, that is for the much older version of yaml.

look at your yaml, it is point to image: gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0 which is the released version, where are you trying to apply version v0.3.0-65-g14293965?

tkrausjr commented 5 years ago

@ andrefelixbr Does you cluster.yaml have the correct parameter set in ProviderSpec for vSphere username ?

andrefelixbr commented 5 years ago

@figo, if you clone the current version of capv and execute git describe --tags you'll see the version v0.3.0-65-g14293965.

This is image: gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0 image version and version v0.3.0-65-g14293965 is the repository tag.

andrefelixbr commented 5 years ago

@tkrausjr yes, it does.

figo commented 5 years ago

@figo, if you clone the current version of capv and execute git describe --tags you'll see the version v0.3.0-65-g14293965.

This is image: gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:0.3.0 image version and version v0.3.0-65-g14293965 is the repository tag.

did u compile base on repo and generate the new image? the new image should be version like image: gcr.io/cnx-cluster-api/vsphere-cluster-api-provider:14293965

akutz commented 5 years ago

Hi all,

The tags now use git describe. This is pretty straight-forward and simply an artifact of something handled in the upcoming #412. Here's an example of what to do: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/403#issuecomment-508531894.

Please note that we don't use the Makefile anymore, and when you generate YAML, please specify the manager image to use with the -m flag.

akutz commented 5 years ago

PR #412 has been merged. Can you please try with the docs now? Thanks!

andrefelixbr commented 5 years ago

Hi @akutz, it works smoothly with new instructions on Get started. Thanks!!

Closing this ticket now.

Thanks @figo and @tkrausjr for the support.