rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.22k stars 583 forks source link

ssh certificates have stopped working in v1.3.11 #2941

Open stefanfritsch opened 2 years ago

stefanfritsch commented 2 years ago

I use ssh certificates to access nodes and this has worked fine for years until at least v1.3.7 but with v1.3.11 (I haven't used the versions in between) it is broken:

Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

The same node works if I add the key to authorized_keys

Steps to Reproduce:

  1. Set up ssh certificate login on a host.
  2. Create a cluster.yaml
  3. Try rke:
    1. With ssh: Works
    2. With v1.3.7: Works
    3. With v1.3.11: Doesn't work
    4. With v1.3.11 and the public key in authorized_keys: Works

Output

ssh

Login works

root@control-0 ~ # ssh stefan.fritsch@shin-11
Last login: Mon May 23 13:17:22 2022 from 159.69.91.228
stefan.fritsch@shin-11:~$ 

v1.3.7

Everything's fine

root@control-0 /decrypted/kubernetes # ./rke_linux-amd64-v1.3.7 etcd snapshot-save --name "etcd-manual-$(date +'%Y-%m-%d')" --config cluster.yml
INFO[0000] Running RKE version: v1.3.7 
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host [shin-12.example.com]
INFO[0000] [dialer] Setup tunnel for host [shin-10.example.com]
INFO[0000] [dialer] Setup tunnel for host [shin-11.example.com]
INFO[0000] [state] Deploying state file to [/etc/kubernetes/etcd-manual-2022-05-23.rkestate] on host [shin-11.example.com]
INFO[0000] [state] Deploying state file to [/etc/kubernetes/etcd-manual-2022-05-23.rkestate] on host [shin-12.example.com]
INFO[0000] [state] Deploying state file to [/etc/kubernetes/etcd-manual-2022-05-23.rkestate] on host [shin-10.example.com]
INFO[0000] Image [rancher/rke-tools:v0.1.78] exists on host [shin-11.example.com]
INFO[0000] Image [rancher/rke-tools:v0.1.78] exists on host [shin-12.example.com]
INFO[0000] Image [rancher/rke-tools:v0.1.78] exists on host [shin-10.example.com]
INFO[0001] Starting container [cluster-state-deployer] on host [shin-10.example.com], try #1
INFO[0001] Starting container [cluster-state-deployer] on host [shin-12.example.com], try #1
INFO[0001] Starting container [cluster-state-deployer] on host [shin-11.example.com], try #1

v1.3.11

Nothing works

root@control-0 /decrypted/kubernetes # ./rke_linux-amd64-v1.3.11 etcd snapshot-save --name "etcd-manual-$(date +'%Y-%m-%d')" --config cluster.yml
INFO[0000] Running RKE version: v1.3.11
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host [shin-11.example.com]
INFO[0000] [dialer] Setup tunnel for host [shin-10.example.com]
INFO[0000] [dialer] Setup tunnel for host [shin-12.example.com]
WARN[0000] Failed to set up SSH tunneling for host [shin-11.example.com]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [shin-11.example.com:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
WARN[0000] Failed to set up SSH tunneling for host [shin-12.example.com]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [shin-12.example.com:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
WARN[0000] Failed to set up SSH tunneling for host [shin-10.example.com]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [shin-10.example.com:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
WARN[0000] Removing host [shin-11.example.com] from node lists
WARN[0000] Removing host [shin-12.example.com] from node lists
WARN[0000] Removing host [shin-10.example.com] from node lists

v1.3.11 with the pubkey on one of the hosts

Note how the node with the key in authorized_keys now works

root@control-0 /decrypted/kubernetes # ./rke_linux-amd64-v1.3.11 etcd snapshot-save --name "etcd-manual-$(date +'%Y-%m-%d')" --config cluster.yml
INFO[0000] Running RKE version: v1.3.11                 
INFO[0000] Starting saving snapshot on etcd hosts       
INFO[0000] [dialer] Setup tunnel for host [shin-10.example.com] 
INFO[0000] [dialer] Setup tunnel for host [shin-12.example.com] 
INFO[0000] [dialer] Setup tunnel for host [shin-11.example.com] 
WARN[0000] Failed to set up SSH tunneling for host [shin-10.example.com]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [shin-10.example.com:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 
WARN[0000] Failed to set up SSH tunneling for host [shin-12.example.com]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [shin-12.example.com:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 
WARN[0000] Removing host [shin-10.example.com] from node lists 
WARN[0000] Removing host [shin-12.example.com] from node lists 

sshd

[...]
May 23 13:44:38 shin-10 sshd[3862349]: Accepted certificate ID "stefan.fritsch at 2022-05-23 11:04:34 user key valid for 10h" (serial 0) signed by RSA CA SHA256:<snip> via /etc/ssh/ssh_trusted_ca.pub
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_answer_keyallowed: publickey authentication: RSA-CERT key is allowed
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_request_send entering: type 23
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_sshkey_verify entering [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_request_send entering: type 24 [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_sshkey_verify: waiting for MONITOR_ANS_KEYVERIFY [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_request_receive_expect entering: type 25 [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_request_receive entering [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_request_receive entering
May 23 13:44:38 shin-10 sshd[3862349]: debug3: monitor_read: checking request 24
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_answer_keyverify: publickey 0x<snip> signature unverified: incorrect signature
May 23 13:44:38 shin-10 sshd[3862349]: debug1: auth_activate_options: setting new authentication options
May 23 13:44:38 shin-10 sshd[3862349]: debug3: mm_request_send entering: type 25
May 23 13:44:38 shin-10 sshd[3862349]: Failed publickey for stefan.fritsch from <ip> port 59546 ssh2: RSA-CERT SHA256:<snip> ID stefan.fritsch at 2022-05-23 11:04:34 user key valid for 10h (serial 0) CA RSA SHA256:<snip>
May 23 13:44:38 shin-10 sshd[3862349]: debug2: userauth_pubkey: authenticated 0 pkalg rsa-sha2-256-cert-v01@openssh.com [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: user_specific_delay: user specific delay 0.000ms [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: ensure_minimum_time_since: elapsed 0.951ms, delaying 5.775ms (requested 6.726ms) [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: userauth_finish: failure partial=0 next methods="publickey" [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: debug3: send packet: type 51 [preauth]
May 23 13:44:38 shin-10 sshd[3862349]: Connection closed by authenticating user stefan.fritsch <ip> port 59546 [preauth]
[...]

System info

RKE version: v1.3.11

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

root@shin-11 /var/log # cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
root@shin-11 /var/log # uname -r
6.4.0-100-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO): bare-metal

cluster.yml file:

nodes:
    - address: shin-10.example.com
      internal_address: 192.168.2.20
      user: stefan.fritsch
      role: [controlplane,worker,etcd]
    - address: shin-11.example.com
      internal_address: 192.168.2.21
      user: stefan.fritsch
      role: [controlplane,worker,etcd]
    - address: shin-12.example.com
      internal_address: 192.168.2.22
      user: stefan.fritsch
      role: [controlplane,worker,etcd]

# Enable use of SSH agent to use SSH private keys with passphrase
# This requires the environment  configured pointing 
# to your SSH agent which has the private key added
ssh_agent_auth: true

SURE-4777

stefanlasiewski commented 2 years ago

We ran into this as well, but only with Ubuntu 20.04 nodes not Ubuntu 18.04 nodes.

I'm using RKE v1.3.10.

stefanfritsch commented 2 years ago

@stefanlasiewski - It's interesting that the server side makes a difference - as ssh from the command line works fine, it's clearly a client (rke) issue but there must be some snafu with the accepted algorithms. I know that openssh for windows (the client not the server) needs PubkeyAcceptedAlgorithms +ssh-rsa-cert-v01@openssh.com in the ~/.ssh/config even if the certificates are rsa-sha2-256.

Could it be related to golang/go/issues/37278?

Birddude1230 commented 2 years ago

We are experiencing a similar issue -- I can confirm that the root cause is a change in the crypto/ssh library -- certificate-based login (with ssh-rsa certs) works fine for versions of crypto/ssh before commit 3147a52a75dd, but is broken after. As best I can tell, the issue is with client_auth.go, in the function pickSignatureAlgorithm. Previously, the library would fail to find a common certificate algo, and would attempt whatever your certificate was as a last-ditch effort. Now, it identifies certificate algos the server should support based on supported key exchange algos, which then include ssh-rsa2-512, ssh-rsa2-256, and ssh-rsa. This sounds like it shouldn't be an issue, since that includes the certificate I want to use, but it does decide on an ssh-rsa2 algo when the cert is ssh-rsa. Why this breaks is not clear, since presumably an ssh-rsa cert can still sign using ssh-rsa2.

So what makes this an RKE issue and not an ssh issue? I suspect, but do not know for certain, that this is a usage issue, mostly because that's the default assumption to make. However, x/crypto is (somewhat unbelievably) still in version 0, so it is deliberately advertising that it is not yet stable. I simply don't have the time to establish confidently where the issue truly lies, especially considering the apparent lack of documentation of x/crypto/ssh.

stefanlasiewski commented 2 years ago

I had luck switching from an RSA key to a ed25519 key (After talking to Rancher support). The upstream Go issue suggests that Go support for RSA keys is broken: https://github.com/golang/go/issues/49952

Also, I notice this issue is discussing certs while my problem is with keys. However, I suspect the underlying cause is the same, and any non-RSA key should work.

github-actions[bot] commented 2 years ago

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

stefanlasiewski commented 2 years ago

@stefanfritsch @Birddude1230 With rke v1.3.14, SSH now works for me. Is it working for you also?

stefanfritsch commented 2 years ago

@stefanlasiewski Can't confirm for v1.3.15. With an ed25519 private key (ca-key is always rsa) I get:

WARN[0000] Failed to set up SSH tunneling for host [shin-11]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Failed to dial ssh using address [shin-11:22]: ssh: handshake failed: agent: unsupported algorithm "ssh-ed25519" 

with rsa:

Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 

In both cases the login with ssh shin-11 works just fine.

stefanlasiewski commented 2 years ago

@stefanfritsch You know what, I was wrong. it's not working for me either.

github-actions[bot] commented 1 year ago

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

stefanlasiewski commented 1 year ago

This issue is still happening. Posting a message to keep this issue open.

snasovich commented 1 year ago

@stefanlasiewski @stefanfritsch , could you check if adding the following settings to /etc/ssh/sshd_config resolves the issue for you:

AllowStreamLocalForwarding yes
DisableForwarding no

Context: https://github.com/rancher/rke/issues/2907#issuecomment-1196803472

stefanlasiewski commented 1 year ago
AllowStreamLocalForwarding yes
DisableForwarding no

This had no effect for me. Note that on Ubuntu 20.04, AllowStreamLocalForwarding yes is already the default according to the manpage. I believe that DisableForwarding no is also the default, but the manpage isn't clear.