Closed waqarkhan88 closed 8 months ago
I have the same issue.
kubeone version
{
"kubeone": {
"major": "1",
"minor": "6",
"gitVersion": "1.6.2",
"gitCommit": "184adc3b7d0c1e2e7630ded518cbfdfab7300755",
"gitTreeState": "",
"buildDate": "2023-04-14T11:20:23Z",
"goVersion": "go1.19.8",
"compiler": "gc",
"platform": "linux/amd64"
},
"machine_controller": {
"major": "1",
"minor": "56",
"gitVersion": "v1.56.2",
"gitCommit": "",
"gitTreeState": "",
"buildDate": "",
"goVersion": "",
"compiler": "",
"platform": "linux/amd64"
}
}
uname -a
Linux kube-eleven-6f4cc8f787-w6482 5.15.133+ #1 SMP Sat Dec 30 11:18:04 UTC 2023 x86_64 Linux
Minimal config to reproduce and simulate the baremetal in cloud:
apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
name: artioms
versions:
kubernetes: 1.28.6
apiEndpoint:
host: 172.31.82.15
port: 6443
cloudProvider:
none: {}
controlPlane:
hosts:
- bastion: 51.44.21.65
bastionPort: 22
bastionUser: ubuntu
privateAddress: 172.31.82.15
sshAgentSocket: env:SSH_AUTH_SOCK
sshPort: 22
sshUsername: ubuntu
machineController:
deploy: false
$ kubeone version
{
"kubeone": {
"major": "1",
"minor": "7",
"gitVersion": "v1.7.2",
"gitCommit": "00fd09d91da76e307f016afb3b4f42ad6281eb2c",
"gitTreeState": "",
"buildDate": "2024-02-22T12:30:21+02:00",
"goVersion": "go1.22.0",
"compiler": "gc",
"platform": "linux/amd64"
},
"machine_controller": {
"major": "1",
"minor": "57",
"gitVersion": "v1.57.4",
"gitCommit": "",
"gitTreeState": "",
"buildDate": "",
"goVersion": "",
"compiler": "",
"platform": "linux/amd64"
}
}
$ kubeone apply -v -d --auto-approve -m kubeone_dump.yaml
INFO[12:24:05 EET] Determine hostname...
[172.31.82.15] + export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
[172.31.82.15] + PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
[172.31.82.15] ++ hostname -f
[172.31.82.15] + fqdn=ip-172-31-82-15.eu-west-3.compute.internal
[172.31.82.15] + '[' ip-172-31-82-15.eu-west-3.compute.internal = localhost ']'
[172.31.82.15] + echo -n ip-172-31-82-15.eu-west-3.compute.internal
[172.31.82.15] ip-172-31-82-15.eu-west-3.compute.internal
DEBU[12:24:11 EET] Hostname is detected: "ip-172-31-82-15.eu-west-3.compute.internal" node=172.31.82.15
INFO[12:24:11 EET] Determine operating system...
DEBU[12:24:11 EET] Operating system detected: "ubuntu" node=172.31.82.15
INFO[12:24:11 EET] Running host probes...
Host: "ip-172-31-82-15.eu-west-3.compute.internal"
Host initialized: no
containerd healthy: no (unknown)
Kubelet healthy: no (unknown)
Everything else is cut for brevity as it's evident it worked past Determine operating system...
. Please help me to help you. What do you have in the /etc/ssh/sshd_config
?
sshd config:
Include /etc/ssh/sshd_config.d/*.conf
PasswordAuthentication no
KbdInteractiveAuthentication no
UsePAM yes
X11Forwarding yes
PrintMotd no
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server
PermitRootLogin without-password
PubkeyAuthentication yes
PermitRootLogin without-password
PubkeyAuthentication yes
ClientAliveInterval 120
UseDNS no
I can access the targeted VM with ssh commands.
For more context: When I was running the kubeone, those are the ssh auth logs from the target VM:
Feb 22 09:58:56 e2e-xw7je8s-gcp-control-fxhqomq-1 sshd[2106364]: Accepted publickey for root from 35.198.92.172 port 65190 ssh2: RSA SHA256:E4mHS/VOYiD9cXGcB7s35lOLX8T5nifhvBGN+skKFFU
Feb 22 09:58:56 e2e-xw7je8s-gcp-control-fxhqomq-1 sshd[2106364]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:56 e2e-xw7je8s-gcp-control-fxhqomq-1 systemd-logind[872]: New session 1767 of user root.
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/os-release
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/kubernetes/pki/apiserver-etcd-client.crt
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/kubernetes/pki/apiserver-kubelet-client.crt
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/kubernetes/pki/apiserver.crt
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
When I terminate kubeone process, I'm seeing those in auth log:
Feb 22 09:59:40 e2e-xw7je8s-gcp-control-fxhqomq-1 sshd[2106364]: pam_unix(sshd:session): session closed for user root
Feb 22 09:59:40 e2e-xw7je8s-gcp-control-fxhqomq-1 systemd-logind[872]: Session 1767 logged out. Waiting for processes to exit.
Feb 22 09:59:40 e2e-xw7je8s-gcp-control-fxhqomq-1 systemd-logind[872]: Removed session 1767.
I've tried to run kubeone apply
from two different machines. From a VM in cloud and from my local PC. Both can access the target VM by ssh.
@cloudziu so what's the error message you're getting?
Output from kubeone apply:
INFO[12:02:49 CET] Running host probes...
ERRO[12:02:51 CET] ssh: popen
Process exited with status 1 node=34.147.205.117
WARN[12:02:51 CET] Task failed, error was: runtime: running task on "34.147.205.117"
ssh: popen
Process exited with status 1
WARN[12:03:01 CET] Retrying task...
INFO[12:03:01 CET] Running host probes...
ERRO[12:03:03 CET] ssh: popen
Process exited with status 1 node=34.147.205.117
WARN[12:03:03 CET] Task failed, error was: runtime: running task on "34.147.205.117"
ssh: popen
Process exited with status 1
I think I might find the issue, but I need to confirm it. Running strace I found that on one of my control plane I'm missing apiserver.crt
. This is strace of sshd process on the target VM.
[pid 4982] openat(AT_FDCWD, "/etc/kubernetes/pki/apiserver.crt", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 4982] write(2, "cat: ", 5) = 5
[pid 4861] <... ppoll resumed>) = 1 ([{fd=11, revents=POLLIN}], left {tv_sec=119, tv_nsec=874212158})
This is the point where the commands are looping, and I'm getting the ssh popen error. If that the issue I think that the error could be more descriptive
@cloudziu can you confirm that you have a passwordless sudo user? I.e. ssh user@host sudo id
works.
@kron4eg yes that's correct. But I need to use a private key. (Which I also am providing in kubeone manifest)
ssh -o IdentitiesOnly=yes -i ./private.pem root@34.147.205.117 sudo id
uid=0(root) gid=0(root) groups=0(root),1001(google-sudoers)
Well... I don't know how to reproduce the problem, it works on my end.
Ok I was able to reproduce the issue.
apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
name: 'test'
versions: kubernetes: 'v1.26.3'
features: coreDNS: replicas: 2 deployPodDisruptionBudget: true
clusterNetwork: cni: cilium: enableHubble: true
cloudProvider: none: {} external: false
controlPlane: hosts:
staticWorkers: hosts:
machineController: deploy: false
2. SSH into control-plane node.
3. Move/remove files:
/etc/kubernetes/pki/apiserver.crt /etc/kubernetes/pki/apiserver.key
4. Try to run `./kubeone apply -m kubeone.yaml -y -d -v` again. You should get the `ssh: popen` error.
This was exactly why it was failing for me in first place. I think that its not a problem with `kubeone` itself. But it would be great if the error message would be more descriptive.
Without those filed apiserver wouldn't even start, isn't it?
Yep, it would not.
What happened?
When running
kubeone apply
on a bare-metal I am getting following error, I tried with Rocky Linux as well as Ubuntu but getting same error. I tried to use Kubeone 1.7.0 and 1.7.2Expected behavior
Cluster creation successful
How to reproduce the issue?
I created 2 VMs using Ubuntu 22.04.4 LTS x86_64, configured and tested SSH connectivity, Installed Kubeone on WSL running same version of Ubuntu and ran kubeone apply command with the manifest file.
What KubeOne version are you using?
Provide your KubeOneCluster manifest here (if applicable)
What cloud provider are you running on?
baremetal
What operating system are you running in your cluster?
Ubuntu 22.04.4 LTS x86_64
Additional information