vultr / docker-machine-driver-vultr

Vultr Driver Plugin for Docker Machine
MIT License
27 stars 11 forks source link

[BUG] - Node provisioning using Rancher RKE and Vultr docker-machine driver fails with certificate validation error #31

Closed kgeipel-retail7 closed 9 months ago

kgeipel-retail7 commented 10 months ago

Describe the bug Node provisioning using Rancher RKE and Vultr docker-machine driver fails with certificate validation error

To Reproduce

Used Environment:

Additional context Vultr Ticket Number: #RCS-91QFV --> requested to open an issue in this project

vultrMasterNodeTemplate.json vultrRKETemplateRancher259.json vultrRKETemplateRancher271.json

happytreees commented 10 months ago

Hello @kgeipel-retail7 thank you for submitting this bug!

Can you provide me with the values you used for the template such as OS, plan, region, userdata, etc?

kgeipel-retail7 commented 10 months ago

Hey @happytreees see the content of the attached file: vultrMasterNodeTemplate.json

"vultrConfig":{ "apiKey":"<OUR_API_KEY>", "appId":"0", "cloudInitUserData":"", "ddosProtection":false, "enableVpc":false, "enabledIpv6":false, "firewallGroupId":"", "floatingIpv4Id":"", "imageId":"", "ipxeChainUrl":"", "isoId":"", "osId":"1743", "region":"fra", "sendActivationEmail":false, "snapshotId":"", "startupScriptId":"", "tags":null, "vpcIds":null, "vpsBackups":false, "vpsPlan":"vhp-4c-8gb-amd" }

happytreees commented 10 months ago

I've done some testing on it and there doesn't appear to be any issues directly with the driver itself. I was able to witness this issue when the newly created RKE cluster's agent cannot contact the primary Rancher cluster due to a firewall.

If you are putting any of these resources behind a firewall please ensure that they are all able to speak with each other.

Additionally, it would be helpful to pull the logs from the rancher agent on the new instance. You can generally see this with docker ps and then use docker logs to pull those.

Additionally, I recommend using the Vultr Rancher UI as it will ensure that all of the default values are correct: https://github.com/vultr/rancher-ui-driver-vultr

kgeipel-retail7 commented 10 months ago

Hey @happytreees thanks for the fast analysis. Yes, I guess it's a firewall topic, but our Rancher has no limitation on outgoing traffic.

As Evan V. pointed out in Ticket: #RCS-91QFV there seems to be a general access limitation to the Vultr compute resources: "I suspect this is because we by default enforce firewall and only allow port 22."

But if Vultr offers compute resources AND a docker-machine driver which is also mentioned to work with Rancher, then the available OS images should be prepared to allow access for the Rancher resources as well, otherwise it's absolutely useless.

Or is it meant that there is the need to configure firewall group rules in the Vultr Management Console? Because by default there is nothing configured, and I thought that then there is no firewall active at all, like other cloud providers handle it.

But section "Required Ports" in the docker-machine drivers Readme says that the firewall is disabled by default by the cloud-init-script of the driver.

kgeipel-retail7 commented 10 months ago

Hey @happytreees I now added an init script , which configures UFW on the OS level, to the Vultr cloud console, which I then linked in my node templates.

It seems that the default cloud init config is not applied correctly.

So the initial issue doesn't appear anymore, but I now have another one:

Rancher UI: Ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain : exit status 1

Rancher Server Log: 2023/11/16 09:31:33 [INFO] Generating and uploading node config vultr-fra-dev-rt7-01-master2 2023/11/16 09:31:33 [DEBUG] [GenericEncryptedStore]: set secret called for mc-m-dlwnj 2023/11/16 09:31:33 [DEBUG] [GenericEncryptedStore]: updating secret mc-m-dlwnj 2023/11/16 09:31:33 [DEBUG] getNodeTemplate parsed [cattle-global-nt:nt-8vlpj] to ns: [cattle-global-nt] and n: [nt-8vlpj] 2023/11/16 09:31:33 [DEBUG] Cleaning up [/opt/jail/c-hb7hh/management-state/node/nodes/vultr-fra-dev-rt7-01-master2] 2023/11/16 09:31:33 [ERROR] error syncing 'c-hb7hh/m-dlwnj': handler node-controller: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain: exit status 1, requeuing 2023/11/16 09:31:33 [DEBUG] [nodepool] bad node found: m-dlwnj

Is there anything else which must be configured in the OS to be able to join the compute resources to an RKE cluster? Because an SSH Key is created during provisioning, I can see it in the Vultr cloud console in the "Account" - "SSH Keys" menu. And I see that there is a key in ~/.ssh/authorized_keys on the provisioned node

happytreees commented 10 months ago

Hello @kgeipel-retail7

It does appear that the default script for some reason is not being applied. I am unsure why that is, however, I will look into that.

For reference, we do have information regarding the ports here: https://github.com/vultr/docker-machine-driver-vultr#required-ports

That error looks like a basic SSH authentication issue. I haven't seen that error myself but I will try to figure out some more on my side. Can you share what the outcome is if you use the default userdata script?:

I2Nsb3VkLWNvbmZpZwoKcnVuY21kOgogLSB1ZncgZGlzYWJsZQ==
kgeipel-retail7 commented 10 months ago

Hey @happytreees default user data script value also works, I already tried it yesterday, just to be sure there is no issue with my port configuration.

Thanks for digging deeper into it, let me know if you found something or need further information from my side.

kgeipel-retail7 commented 10 months ago

Hey @happytreees could you already take a look onto that SSH issue?

kgeipel-retail7 commented 9 months ago

This issue will be closed, it's not caused by the driver, the root cause is the used Ubuntu 22.04 LTS image, it seems that there was something changed on the SSH public key authentication method. We observed the same behavior on another cloud provider, a provisioning using Ubuntu 20.04 LTS works out of the box.