rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.21k stars 582 forks source link

rke_linux-amd64 up: Failed to set up SSH tunneling for host #1417

Closed gknepper closed 5 years ago

gknepper commented 5 years ago

RKE version: rke version v0.2.4

Docker version: (docker version,docker info preferred) 'docker version' Client: Version: 18.06.2 API version: 1.38 Go version: go1.10.7 Git commit: 6d37f41 Built: Wed Jun 12 23:08:07 2019 OS/Arch: linux/amd64 Experimental: false

Server: Engine: Version: 18.06.2-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.7 Git commit: 6d37f41 Built: Wed Jun 12 23:09:09 2019 OS/Arch: linux/amd64 Experimental: false

'docker info' Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 18.06.2-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e runc version: a592beb5bc4c4092b1b1bac971afed27687340c5 (expected: 69663f0bd4b60df09991c08812a60108003fa340) init version: fec3683 Security Options: apparmor seccomp Profile: default Kernel Version: 4.19.52-1.ph3-esx Operating System: VMware Photon OS/Linux OSType: linux Architecture: x86_64 CPUs: 1 Total Memory: 1.952GiB Name: rke04 ID: J34G:FK4K:C3NF:IVZY:F52P:BIMT:TPEB:L7OO:GEHZ:P4DW:VCD4:6TNO Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred) 'cat /etc/os-release' NAME="VMware Photon OS" VERSION="3.0" ID=photon VERSION_ID=3.0 PRETTY_NAME="VMware Photon OS/Linux" ANSI_COLOR="1;34" HOME_URL="https://vmware.github.io/photon/" BUG_REPORT_URL="https://github.com/vmware/photon/issues"

'uname -r' 4.19.52-1.ph3-esx

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) EXS VM

cluster.yml file: nodes:

services: etcd: snapshot: true creation: 6h retention: 24h

Steps to Reproduce: Start 3 VMs running photon-hw13_uefi-3.0-26156e2.ova and try to install RKE

Results: INFO[0000] Initiating Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [192.168.1.201] INFO[0000] [dialer] Setup tunnel for host [192.168.1.202] INFO[0000] [dialer] Setup tunnel for host [192.168.1.203] WARN[0000] Failed to set up SSH tunneling for host [192.168.1.203]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) WARN[0000] Failed to set up SSH tunneling for host [192.168.1.201]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) WARN[0000] Failed to set up SSH tunneling for host [192.168.1.202]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) WARN[0000] Removing host [192.168.1.203] from node lists WARN[0000] Removing host [192.168.1.201] from node lists WARN[0000] Removing host [192.168.1.202] from node lists WARN[0000] [state] can't fetch legacy cluster state from Kubernetes INFO[0000] [certificates] Generating CA kubernetes certificates INFO[0000] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates INFO[0000] [certificates] Generating admin certificates and kubeconfig INFO[0001] [certificates] Generating Kube Controller certificates INFO[0001] [certificates] Generating Kube Scheduler certificates INFO[0001] [certificates] Generating Node certificate
INFO[0001] [certificates] Generating Kubernetes API server certificates INFO[0002] [certificates] Generating Kube Proxy certificates INFO[0002] [certificates] Generating Kubernetes API server proxy client certificates INFO[0002] Successfully Deployed state file at [./rancher-clusterWORKING.rkestate] INFO[0002] Building Kubernetes cluster
FATA[0002] Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config

superseb commented 5 years ago

Can you run through https://rancher.com/docs/rke/latest/en/troubleshooting/ssh-connectivity-errors/ and see if that solves your issue?

gknepper commented 5 years ago

Unfortunately no success.

INFO[0000] Initiating Kubernetes cluster                
INFO[0000] [dialer] Setup tunnel for host [192.168.1.203] 
INFO[0000] [dialer] Setup tunnel for host [192.168.1.201] 
INFO[0000] [dialer] Setup tunnel for host [192.168.1.202] 
WARN[0000] Failed to set up SSH tunneling for host [192.168.1.201]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) 
WARN[0000] Failed to set up SSH tunneling for host [192.168.1.202]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) 
WARN[0000] Failed to set up SSH tunneling for host [192.168.1.203]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) 
WARN[0000] Removing host [192.168.1.201] from node lists 
WARN[0000] Removing host [192.168.1.202] from node lists 
WARN[0000] Removing host [192.168.1.203] from node lists 
WARN[0000] [state] can't fetch legacy cluster state from Kubernetes 
INFO[0000] [certificates] Generating CA kubernetes certificates 
INFO[0000] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates 
INFO[0000] [certificates] Generating Kube Controller certificates 
INFO[0001] [certificates] Generating Kube Scheduler certificates 
INFO[0001] [certificates] Generating Kube Proxy certificates 
INFO[0001] [certificates] Generating Node certificate   
INFO[0001] [certificates] Generating admin certificates and kubeconfig 
INFO[0001] [certificates] Generating Kubernetes API server proxy client certificates 
INFO[0002] [certificates] Generating Kubernetes API server certificates 
INFO[0002] Successfully Deployed state file at [./rancher-clusterWORKING.rkestate] 
INFO[0002] Building Kubernetes cluster                  
FATA[0002] Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config 

and I'm able to enter in all hosts using only the ssh keys:

user@nginx:~$ ssh root@192.168.1.201
Last login: Wed Jun 26 02:54:48 2019 from 192.168.1.200
 02:56:16 up 18 min,  1 user,  load average: 0.00, 0.00, 0.00
tdnf update info not available yet!
root@rke01 [ ~ ]# exit
logout
Connection to 192.168.1.201 closed.
user@nginx:~$ ssh root@192.168.1.202
Last login: Wed Jun 26 02:41:41 2019 from 192.168.1.200
 02:56:20 up 18 min,  0 users,  load average: 0.04, 0.03, 0.00
tdnf update info not available yet!
root@rke02 [ ~ ]# exit
logout
Connection to 192.168.1.202 closed.
user@nginx:~$ ssh root@192.168.1.203
Last login: Wed Jun 26 02:42:05 2019 from 192.168.1.200
 02:56:24 up 18 min,  0 users,  load average: 0.00, 0.03, 0.01
tdnf update info not available yet!
root@rke03 [ ~ ]# exit
logout
Connection to 192.168.1.203 closed.
superseb commented 5 years ago

It needs to be able to use /var/run/docker.sock on the the hosts, what does ls -la /var/run/docker.sock show? Or docker ps?

gknepper commented 5 years ago

I'm running as root in the machines, you can see also in the previous log. Although I ran again:

user@nginx:~$ ./runAll.sh

hostname: rke01 username: root ls -la /var/run/docker.sock srw-rw---- 1 root docker 0 Jun 26 02:38 /var/run/docker.sock docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

hostname:rke02 username:root ls -la /var/run/docker.sock srw-rw---- 1 root docker 0 Jun 26 02:38 /var/run/docker.sock docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

hostname:rke03 username:root ls -la /var/run/docker.sock srw-rw---- 1 root docker 0 Jun 26 02:38 /var/run/docker.sock docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

superseb commented 5 years ago

Can you share the content of /etc/ssh/sshd_config as well? The image on GCE works without issues (after enabling root login in the config)

gknepper commented 5 years ago

user@nginx:~$ cat /etc/ssh/sshd_config

#       $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin

# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented.  Uncommented options override the
# default value.

#Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
#HostKey /etc/ssh/ssh_host_ed25519_key

# Ciphers and keying
#RekeyLimit default none

# Logging
#SyslogFacility AUTH
#LogLevel INFO

# Authentication:

#LoginGraceTime 2m
#PermitRootLogin prohibit-password
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

#PubkeyAuthentication yes

# Expect .ssh/authorized_keys2 to be disregarded by default in future.
#AuthorizedKeysFile     .ssh/authorized_keys .ssh/authorized_keys2

#AuthorizedPrincipalsFile none

#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody

# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes

# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no

# Change to yes to enable challenge-response passwords (beware issues with
# some PAM modules and threads)
ChallengeResponseAuthentication no

# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no

# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes
#GSSAPIStrictAcceptorCheck yes
#GSSAPIKeyExchange no

# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication.  Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of "PermitRootLogin without-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to 'no'.
UsePAM yes

#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
PrintMotd no
#PrintLastLog yes
#TCPKeepAlive yes
#PermitUserEnvironment no
#Compression delayed
#ClientAliveInterval 0
#ClientAliveCountMax 3
#UseDNS no
#PidFile /var/run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none

# no default banner path
#Banner none

# Allow client to pass locale environment variables
AcceptEnv LANG LC_*

# override default of no subsystems
Subsystem sftp  /usr/lib/openssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#       X11Forwarding no
#       AllowTcpForwarding no
#       PermitTTY no
#       ForceCommand cvs server
PasswordAuthentication yes
gknepper commented 5 years ago

Just remembering I'm using ubuntu server from the server starting the RKE installation and PhotonOS on the RKE servers. This issue don't happen when I use another PS on RKE servers like ubuntu server or linux atomic. So in my opinion this is something on Photon OS not in the ubuntu server.

superseb commented 5 years ago

Yes, I'm requesting the file from the Photon server.

gknepper commented 5 years ago

This is the sshd_config from ProtonOS servers.

root@rke01 [ ~ ]# cat /etc/ssh/sshd_config
#       $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin

# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented.  Uncommented options override the
# default value.

#Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
#HostKey /etc/ssh/ssh_host_ed25519_key

# Ciphers and keying
#RekeyLimit default none

# Logging
#SyslogFacility AUTH
#LogLevel INFO

# Authentication:

#LoginGraceTime 2m
#PermitRootLogin prohibit-password
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

#PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile      .ssh/authorized_keys

#AuthorizedPrincipalsFile none

#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody

# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes

# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no

# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes

# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no

# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes

# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication.  Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of "PermitRootLogin without-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to 'no'.
#UsePAM no

#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
#X11Forwarding no
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
#PrintMotd yes
#PrintLastLog yes
#TCPKeepAlive yes
#PermitUserEnvironment no
#Compression delayed
#ClientAliveInterval 0
#ClientAliveCountMax 3
#UseDNS no
#PidFile /var/run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none

#FipsMode no

# no default banner path
#Banner none

# override default of no subsystems
Subsystem       sftp    /usr/libexec/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#       X11Forwarding no
#       AllowTcpForwarding no
#       PermitTTY no
#       ForceCommand cvs server
AllowTcpForwarding no
ClientAliveCountMax 2
Compression no
MaxAuthTries 2
TCPKeepAlive no
AllowAgentForwarding no
PermitRootLogin yes
UsePAM yes
superseb commented 5 years ago

Can you try with AllowTcpForwarding yes?

gknepper commented 5 years ago

I think this parameter solved the issue, now I'm able to install but I'm getting this error below:

INFO[0225] [etcd] Successfully started etcd plane.. Checking etcd cluster health FATA[0361] [etcd] Failed to bring up Etcd Plane: [etcd] Etcd Cluster is not healthy

superseb commented 5 years ago

I created https://github.com/rancher/docs/issues/1560 to describe the needed flags, can you confirm this is your custom config and not the default one from the image?

Regarding the other error, this usually happens when old state is used (older files in the directory where RKE is run or nodes that are not cleaned properly from previous runs). If this doesn't solve the issue, please file a new issue with steps to reproduce and the --debug output so we can investigate.

gknepper commented 5 years ago

AllowTcpForwarding no is the default for VMware Photon OS. I don't know abou other distributions.

I clean up all the distro, tried reinstalling from scratch but didn't worked in both cases. I give a try on Rancher OS and worked in my first try I'm thinking in give up the Photon OS for now. Thank you.

vkim-rogers commented 5 years ago

Hi Team,

Thank you so much for all the help with that issue. Just run into the same one and your suggestions worked like a charm.

Let me conclude and aggregate the suggestions into single message.

Prerequisites

Error message (from RKE)

Solutions (1)

(2)

MrAmbiG commented 4 years ago

following enabled PermitRootLogin yes AllowTcpForwarding yes

sudo usermod -aG docker $USER passwordless authentication to all nodes from rke machine. OS = ubuntu18.04. but no go. same error, same problem @superseb

superseb commented 4 years ago

Please file a new issue with all info and logs so we can take a look.

scheung38 commented 3 years ago

Also facing the same issue:

Failed to set up SSH tunneling for host [178.XX.X.XX]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [178.XX.X.XX:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

vim /etc/ssh/sshd_config: AllowTcpForwarding yes PermitRootLogin yes

sudo usermod -aG docker root

ssh root@178.XX.X.XX is fine Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-58-generic x86_64)

Host Mac 11.1 Docker desktop: 3.0.1 (50773)

VM: Digital Ocean Ubuntu 20.04.1 LTS Docker version 19.03.14 build 5eb3275d40

superseb commented 3 years ago

Please file a new issue with the cluster.yml used and the output of ls -la of the key file that is configured (or the default). Other logging like ssh -v and system logging from sshd from the unsuccessful attempt by RKE vs a successful attempt using ssh would help.

scheung38 commented 3 years ago

Thanks it resolved now as I entered the incorrect ssh key that was generated. Simple but overlooked

Sent from my iPhone

On 21 Dec 2020, at 10:17, Sebastiaan van Steenis notifications@github.com wrote:

 Please file a new issue with the cluster.yml used and the output of ls -la of the key file that is configured (or the default). Other logging like ssh -v and system logging from sshd from the unsuccessful attempt by RKE vs a successful attempt using ssh would help.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

roshpr commented 2 years ago

Hi Team,

Thank you so much for all the help with that issue. Just run into the same one and your suggestions worked like a charm.

Let me conclude and aggregate the suggestions into single message.

Prerequisites

* Google Compute Engine

* Provisioning K8s cluster over RKE

Error message (from RKE)

* FATA[0002] Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config

Solutions (1)

* You must edit /etc/ssh/sshd_config and enable the following 2 options:
  > PermitRootLogin yes
  > AllowTcpForwarding yes

* And sure, you must configure RKE to connect to those nodes using **root** login

(2)

* Add the user under which you are connecting to the nodes to the **docker** group:
  > sudo usermod -aG docker $USER

* After that you will be fine (no need to connect using root).

* Yet not sure, whether AllowTcpForwading required in that case.

Thanks. I had the same issue. These settings resolved the problem and was able to proceed with the ssh connection.

litao3rd commented 1 year ago

I've encountered the same issue. My hosts are running CentOS 7.9, and as I was going through the SSH Connectivity Errors documentation, I came across the line that reads, 'SSH server version is not version 6.7 or higher.' I'm wondering whether this requirement is still applicable? My hosts are currently using SSH-2.0-OpenSSH_7.4.