Closed eyanez111 closed 2 years ago
Hello @eyanez111 no problem to add an ssh key inside template, it works perfectly
the error message don't come from the drivers himself , maybe a communication problem between rancher and the driver
can you give me the following information:
can you activate debug log as specified in this doc => https://rancher.com/docs/rancher/v2.6/en/troubleshooting/logging/
and next share here the entire rancher log during a cluster creation
Hello @tuxtof
How do I add keys from a Nutanix user? I do not think there is a way to get keys. You just assign passwords , don't you? If you think this is a communication problem, I do not get why it is expecting an SSH key. Where in a node template in Rancher can you put a key... as far as I can see there is no field for that.
it was installed on a karbon cluster using helm.
I will share the logs shortly
thanks Francisco
SSH key is for communication between rancher and vm , they keys was generated automatically by the driver. You have nothing to do on this subject, the only ssh key you can add is inside the cloud-init but this is for your own usage if you want to connect to the VM without using the ssh rancher key.
You need also to verify your template file , not sure all is OK inside , but i don't know your environment.
Once your cluster creation launched, did you see VM in PC ??
Regards
Adding the logs:
2021/12/23 21:10:21 [INFO] Generating and uploading node config worker3
2021/12/23 21:10:21 [INFO] Generating and uploading node config control-plane1
2021/12/23 21:10:21 [INFO] Generating and uploading node config worker1
2021/12/23 21:10:21 [INFO] Generating and uploading node config worker2
2021/12/23 21:10:36 [ERROR] error syncing 'c-m4t6l/m-48mc6': handler node-controller: Error creating machine: Error in driver during machine creation: error: {, requeuing
2021/12/23 21:10:36 [ERROR] error syncing 'c-m4t6l/m-bv8k7': handler node-controller: Error creating machine: Error in driver during machine creation: error: {, requeuing
2021/12/23 21:10:36 [ERROR] error syncing 'c-m4t6l/m-xqh6g': handler node-controller: Error creating machine: Error in driver during machine creation: error: {, requeuing
2021/12/23 21:10:36 [ERROR] error syncing 'c-m4t6l/m-q7hz8': handler node-controller: Error creating machine: Error in driver during machine creation: error: {, requeuing
2021/12/23 21:10:36 [ERROR] error syncing 'c-m4t6l/m-92jp2': handler node-controller: Error creating machine: Error in driver during machine creation: error: {, requeuing
2021/12/23 21:10:36 [ERROR] error syncing 'c-m4t6l/m-8svvp': handler node-controller: Error creating machine: Error in driver during machine creation: error: {, requeuing
Also I am getting on the rancher server: Error creating machine: Error in driver during machine creation: error: {:Timeout waiting for ssh key
adding pic:
Let me answer this questions here: it seems there is indentation issue in your cloudInit , you can test with an empty cloudInit to verify I did and still the same error and same logs-- Do you want them?
storageContainer need to be a UUID if you ask a second disk but i see the size of the second disk is 0 ?? So you recommend to add a second disk for the cluster infra? I can add one if that would make a change
for endpoint is ntx-dev.URL.com your PC instance ??? yes that is the Prism Central domain we use
for cluster name NTX-DEV is uppercase , expected ???? yes we have it like that in Nutanix
_you have created a nutanixsupport admin user in PC ??? yes that is a user I created in PC with admin rights for Nutanix support when they want to tunnel in, It is also in PE . Nutanix support uses it all the time and have no problem tunneling in
insecure is set to false, did you set a correct certificate chain for your PC ? I think is set as secure, I have tried both ways and get the same result but how can I verify what is set in PC?
Once your cluster creation launched, did you see VM in PC ?? No, I checked and nothing was created
Thanks in advance. I think I am pretty close
Logs seems not in debug mode, did you change the mode ?
I followed the guide you passed me and I did:
$ KUBECONFIG=./kube_config_cluster.yml
$ kubectl -n cattle-system get pods -l app=rancher --no-headers -o custom-columns=name:.metadata.name | while read rancherpod; do kubectl -n cattle-system exec $rancherpod -c rancher -- loglevel --set debug; done
OK
OK
OK
$ kubectl -n cattle-system logs -l app=rancher -c rancher
am I missing anything on the command?
Ok I found a way to get the logs: logs.txt
thanks I think I am almost there... this has been helpful
ok it seems better now you have debug entry in the log but i don't see the creation step in this logs (is it the correct time range , is it the combined logs of the three containers ?)
the beginning of the creation in the log need to start with something like
2021/12/24 06:04:45 [INFO] [node-controller-rancher-machine] Docker Machine Version: v0.15.0-rancher70, build e51aa220
2021/12/24 06:04:45 [INFO] [node-controller-rancher-machine] Found binary path at /var/lib/rancher/management-state/bin/docker-machine-driver-nutanix
2021/12/24 06:04:45 [INFO] [node-controller-rancher-machine] Launching plugin server for driver nutanix
2021/12/24 06:04:45 [INFO] [node-controller-rancher-machine] Plugin server listening at address 127.0.0.1:46631
in all case can you switch the log level to trace so we can have the entire communication because i try to reproduce your error without success since yesterday
you can filter the log on the node-controller-rancher-machine
pattern and give me only the corresponding line
i just validate the command on a rancher helm install and i get the correct log
kubectl -n cattle-system get pods -l app=rancher --no-headers -o custom-columns=name:.metadata.name | while read rancherpod; do kubectl -n cattle-system exec $rancherpod -c rancher -- loglevel --set trace ; done
kubectl -n cattle-system logs -f -l app=rancher -c rancher 2>&1 | grep node-controller-rancher-machine
and i have all the expected log
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Docker Machine Version: v0.15.0-rancher73, build 7766c706
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Found binary path at /var/lib/rancher/management-state/bin/docker-machine-driver-nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Docker Machine Version: v0.15.0-rancher73, build 7766c706
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Found binary path at /var/lib/rancher/management-state/bin/docker-machine-driver-nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Launching plugin server for driver nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Launching plugin server for driver nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Plugin server listening at address 127.0.0.1:35779
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Plugin server listening at address 127.0.0.1:37349
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetVersion
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Using API Version 1
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetVersion
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .SetConfigRaw
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Using API Version 1
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .SetConfigRaw
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (flag-lookup) Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (flag-lookup) Calling .DriverName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (flag-lookup) Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (flag-lookup) Calling .GetCreateFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (flag-lookup) Calling .DriverName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Found binary path at /var/lib/rancher/management-state/bin/docker-machine-driver-nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (flag-lookup) Calling .GetCreateFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Launching plugin server for driver nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Found binary path at /var/lib/rancher/management-state/bin/docker-machine-driver-nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Launching plugin server for driver nutanix
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Plugin server listening at address 127.0.0.1:35485
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Plugin server listening at address 127.0.0.1:34037
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetVersion
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetVersion
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Using API Version 1
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Using API Version 1
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .SetConfigRaw
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .SetConfigRaw
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] () Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .DriverName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-m1) Calling .GetMachineName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .GetCreateFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-m1) Calling .DriverName
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .GetCreateFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-m1) Calling .GetCreateFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .SetConfigFromFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-m1) Calling .GetCreateFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Creating CA: /management-state/node/nodes/ze3-w1/certs/ca.pem
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-m1) Calling .SetConfigFromFlags
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Creating client certificate: /management-state/node/nodes/ze3-w1/certs/cert.pem
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Creating CA: /management-state/node/nodes/ze3-m1/certs/ca.pem
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Creating client certificate: /management-state/node/nodes/ze3-m1/certs/cert.pem
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Running pre-create checks...
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .PreCreateCheck
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .GetConfigRaw
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] Creating machine...
2021/12/24 07:43:10 [INFO] [node-controller-rancher-machine] (ze3-w1) Calling .Create
2021/12/24 07:43:10 [TRACE] [node-controller-rancher-machine] (ze3-w1) DBG | time="2021-12-24T07:43:10Z" level=info msg="Connecting on: pc.nutanix.com:9440"
I left it running for longer in case you needed more info: log3.txt
thanks
Hi Take a quick look between oyster and salmon 😎 Issue seems coming from the subnet name Complexity break the search filter
As a temporary fix can you check with a simple subnet name
sorry I am not familiar with subnets. I need to check with the Networking team to provide a simple subnet name for the subnet we are in?
thanks
Don't worry I will reproduce it and look to bring a fix soon
Happy Christmas 🎄
Hello @eyanez111
Santa Claus 🎅 has just passed, and put a new release (v3.0.1) under the Christmas tree 🎄 It normally fix your issue
i let you test and come back to me
🎄🎄🎄 !!Merry Christmas !!! 🎄🎄🎄
thanks you so much!! so I just have to delete the driver and add:
Download URL: https://github.com/nutanix/docker-machine/releases/download/v3.0.1/docker-machine-driver-nutanix_v3.0.0_linux Custom UI URL: https://nutanix.github.io/rancher-ui-driver/v3.0.1/component.js Whitelist Domains: nutanix.github.io
or am I missing anything?
Merry Christmas!
No need to delete, just update the driver and change the download URL Be careful there is two time 3.0.1 in the url
The UI don't change and stay in 3.0.0
Cheers 🥂
Hello @tuxtof, thanks for the gift and hope you had nice holidays! I tried our dev cluster and it worked! Now I just tried on our prod cluster and got a different error: Notifying bugsnag: [Error creating machine: Error in driver during machine creation: error: {
I used the same template just pointed at a different cluster. So I just changed the: Management Endpoint and the Cluster
The rest still is the same. I am attaching the logs: logs-nutanix.txt
thanks for all the help it worked on DEV!
Ah! I kept playing with it and looks like there is a problem with the Additional Disk Size and the Storage Container. What are the limitations if I leave those blank?
Thanks Francisco
Hello @eyanez111 , Happy new year
the problem come from how you specify the storage container for the additional disk. You need to give the UUID of the storage container and not the name
Additional Disk is not mandatory, no specific limitation it is just for people who want it
Best Regards
I used:
It got registered on the Rancher server. I am trying to build an RKE1 cluster, I created my Node Template: { "annotations": { "ownerBindingsCreated": "true" }, "baseType": "nodeTemplate", "cloudCredentialId": null, "created": "2021-12-22T00:07:34Z", "createdTS": 1640131654000, "creatorId": "user-xtj9l", "driver": "nutanix", "engineEnv": { }, "engineInstallURL": "https://releases.rancher.com/install-docker/18.09.sh", "engineLabel": { }, "engineOpt": { }, "engineRegistryMirror": [ ], "id": "cattle-global-nt:nt-tphlk", "labels": { "cattle.io/creator": "norman" }, "links": { "nodePools": "…/v3/nodePools?nodeTemplateId=cattle-global-nt%3Ant-tphlk", "nodes": "…/v3/nodes?nodeTemplateId=cattle-global-nt%3Ant-tphlk", "remove": "…/v3/nodeTemplates/cattle-global-nt:nt-tphlk", "self": "…/v3/nodeTemplates/cattle-global-nt:nt-tphlk", "update": "…/v3/nodeTemplates/cattle-global-nt:nt-tphlk" }, "name": "RK1-test", "nutanixConfig": { "cloudInit": "#cloud-config\nusers:\n- name: tony\n sudo: ['ALL=(ALL) NOPASSWD:ALL']\n ssh-authorized-keys:\n - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDNhhR0Wf4GSz1K5cLdIYPcrKG27irKGgbkzyb3JS/x1irCysGPi9SIj5gChBGNGv99p9gZGPGFgL+CYdXdCORgyT........ "cluster": "NTX-DEV", "diskSize": "0", "endpoint": "ntx-dev.URL.com", "insecure": false, "password": "XXXX", "port": "9440", "storageContainer": "VM", "username": "nutanix_support", "vmCategories": [ ], "vmCores": "1", "vmCpuPassthrough": false, "vmCpus": "2", "vmImage": "CentOS-7-x86_64-GenericCloud-1907", "vmImageSize": "300", "vmMem": "4096", "vmNetwork": [ "Software Development Apps (VLAN 125)" ] }, "principalId": "local://user-xtj9l", "state": "active", "transitioning": "no", "transitioningMessage": "", "type": "nodeTemplate", "useInternalIpAddress": true, "uuid": "4cf59fe2-bb41-4ded-99d7-fb11e527e0f2" }
and I am getting this error: Error creating machine: Error in driver during machine creation: error: {:Timeout waiting for ssh key
is this a problem with the driver? as there is no way for me to add an ssh key when I create a template