Open fcioffi opened 4 years ago
Maybe have a look at https://github.com/digitalism/k3os-box, I just got networking of a 3 node cluster working.
k3os-server [~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k3os-server Ready master 8m37s v1.17.2+k3s1
k3os-1 Ready <none> 8m12s v1.17.2+k3s1
k3os-2 Ready <none> 8m12s v1.17.2+k3s1
k3os-3 Ready <none> 8m12s v1.17.2+k3s1
I have the exact same issue on the k3os 0.10.0 release.
Result:
I seem to remember that my previous install (using an older version of k3os) did NOT have this issue; I had a 3-node cluster up and running using the same method.
me too. i got the same issues.
the k3os version is k3os version v0.11.0-rc1
the errors message in k3os server is
I0710 10:18:04.777834 2458 log.go:172] http: TLS handshake error from 192.168.31.126:54474: remote error: tls: bad certificate time="2020-07-10T10:18:04.837090973Z" level=error msg="Node password validation failed for 'miwifi-r1cm-srv', using passwd file '/var/lib/rancher/k3s/server/cred/node-passwd'"
but if i kill k3s agent
then join it by myself. it work well.
k3s agent --with-node-id --server https://192.168.31.211:6443 --token "K10a0d38146071578aa46b8e81b34048d03d0f377f8b7ace2e48bbec6c234b36e95::server:1234"
please help...
me too. i got the same issues.
the k3os version is
k3os version v0.11.0-rc1
the errors message in k3os server is
I0710 10:18:04.777834 2458 log.go:172] http: TLS handshake error from 192.168.31.126:54474: remote error: tls: bad certificate time="2020-07-10T10:18:04.837090973Z" level=error msg="Node password validation failed for 'miwifi-r1cm-srv', using passwd file '/var/lib/rancher/k3s/server/cred/node-passwd'"
but if i kill
k3s agent
then join it by myself. it work well.
k3s agent --with-node-id --server https://192.168.31.211:6443 --token "K10a0d38146071578aa46b8e81b34048d03d0f377f8b7ace2e48bbec6c234b36e95::server:1234"
please help...
i find the solution. just setup the "ntp server" and make sure 2 vm have different "hostname". anything will be ok... magic ^^||...
hostname: test-master
ntp_servers:
- 0.us.pool.ntp.org
- 1.us.pool.ntp.org
Any updates here? I'm having the same issue. Release 0.11.0-rc1. Followed same steps as @evoncken, but agent doesn't connect. I've added the token from server, add the server ip. Here is my agent yaml file below. the server config is similar but without the --server and --token flags and it works just fine. What am I missing?
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQ.....
hostname: z420_2
k3os:
k3s_args:
- agent
- "--node-ip=192.168.1.45"
- "--flannel-iface=eth0"
- "--server=https://192.168.1.43:6443"
- "--token=K105a96558bd927049955bc6a9060aaabff0b6dafd7e3fc286f0e21dfae57ac1b67::serv
er:2f44c9b100d35cbd9c8c78caaa4cc0b4"
Any updates? I'm seeing this on v0.11.1 as well.
Using v0.11.1 I'm able to join an agent, but upon applying a config after install to change the hostname, configure static ip and add a worker label the agent fails to connect back to the master.
Server Config - /var/lib/rancher/k3os/config.yaml
hostname: k3s-master
write_files:
- path: /var/lib/connman/default.config
content: |-
[service_eth0]
Type=ethernet
IPv4=10.1.1.50/255.255.255.0/10.1.1.1
IPv6=off
Nameservers=10.1.1.1
k3os:
dns_nameservers:
- 10.1.1.1
ntp_servers:
- 0.us.pool.ntp.org
- 1.us.pool.ntp.org
Agent Config - /var/lib/rancher/k3os/config.yaml
hostname: k3s-worker1
write_files:
- path: /var/lib/connman/default.config
content: |-
[service_eth0]
Type=ethernet
IPv4=10.1.1.51/255.255.255.0/10.1.1.1
IPv6=off
Nameservers=10.1.1.1
k3os:
dns_nameservers:
- 10.1.1.1
labels:
node-role.kubernetes.io/worker: ""
ntp_servers:
- 0.us.pool.ntp.org
- 1.us.pool.ntp.org
Edit: Have even tried upgrading server to v0.19.4-dev.5
and pruning stale entries in /var/lib/rancher/k3s/server/cred/node-passwd
but no joy. Upon agent reboot the node is added back to server node-passwd
file, but log still show bad cert.
time="2020-11-26T17:46:47.845635098Z" level=info msg="Handling backend connection request [k3s-worker1]"
time="2020-11-26T17:46:47.855621917Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
time="2020-11-26T17:46:53.849317918Z" level=info msg="Cluster-Http-Server 2020/11/26 17:46:53 http: TLS handshake error from 10.1.1.51:53260: remote error: tls: bad certificate"
time="2020-11-26T17:46:53.879528915Z" level=info msg="Cluster-Http-Server 2020/11/26 17:46:53 http: TLS handshake error from 10.1.1.51:53272: remote error: tls: bad certificate"
time="2020-11-26T17:46:53.901297377Z" level=info msg="certificate CN=k3s-worker1 signed by CN=k3s-server-ca@1606402873: notBefore=2020-11-26 15:01:13 +0000 UTC notAfter=2021-11-26 17:46:53 +0000 UTC"
time="2020-11-26T17:46:53.906918046Z" level=info msg="certificate CN=system:node:k3s-worker1,O=system:nodes signed by CN=k3s-client-ca@1606402873: notBefore=2020-11-26 15:01:13 +0000 UTC notAfter=2021-11-26 17:46:53 +0000 UTC"
Edit 2: Removing my label from the agent config finally allowed it to join to the master. Figured this out by killing the k3s process on the agent machine and trying out different args. When I removed the label it joined successfully.
Same here with 0.11.1
using the correct default ways of
server
ssh_authorized_keys:
- <redacted>
hostname: k3os.<redacted>
k3os:
modules:
- kvm
- nvme
dns_nameservers:
- 1.1.1.1
ntp_servers:
- 0.us.pool.ntp.org
token: supersecret
agent
ssh_authorized_keys:
- ssh-rsa <redacted>
hostname: k3os-agent.<redacted>
k3os:
modules:
- kvm
- nvme
dns_nameservers:
- 1.1.1.1
ntp_servers:
- 0.us.pool.ntp.org
server_url: https://10.xx.xx.serverip:6443
token: K1023d39969b1298dfb394bde1a93bcae9c5c7bc4dea29fa28c1b87a6344308613a::server:supersecret
I already use different hostname and also timeservers. As others named, using it on the cli works fine on the agent node
sudo k3s agent --with-node-id --server https://10.10.xx.serverip:6443 --token "K1023d39969b1298dfb394bde1a93bcae9c5c7bc4dea29fa28c1b87a6344308613a::server:supersecret"
As far as i understand the docs and the k3s installation script https://github.com/k3s-io/k3s/blob/master/install.sh#L161 is that the mode is derived by either dedicated k3s_args OR if not set the following cases are determined
a) by if the server_url
is set or not. If the server_url
is set, the token
will be treated as a cluster secret to join, the command will be agent.
b)if no server_url
is provided, we will have a cluster secret by the token, and the command will be server.
This said, the yaml files above should cause the agent to connect on boot time, which is not the case.
In the case of @digitalsm https://github.com/digitalism/k3os-box/blob/master/scripts/configure_k3s_node.sh#L27 the entire k3s_args are constructed the same way as expected. I did not try this in my case but @s3rgb did and seem to have failed (which i would not have expected)
UPDATE: I was able to add the actual agent without any cli manipulations after cloud-init
token:
nor server_url:
since at least for me, since my agent runs on the same hypervisor, i will need to run the agent with --with-node-id
or the connection want work ( same hostname ).k3s_args
ourselfssh_authorized_keys:
- ssh-rsa <redacted>
hostname: k3os-agent.<redacted>
k3os:
modules:
- kvm
- nvme
dns_nameservers:
- 1.1.1.1
ntp_servers:
- 0.us.pool.ntp.org
# we cannot use server_url nor token since we need to override k3s_args which would override / overrule those 2
# server_url: https://10.xx.xx.serverip:6443
# token: supersecret
k3s_args:
- agent
- "--server=10.x.x.serverip:6443"
- "--token=supersecret"
- "--with-node-id"
Hope this helps anybody else. Beside that, i'am not really sure if k3os is a serious consideration considering the current status in terms of documentation and drive and support to get things like this minimal setup up and running. It seems like k3s has a lot of drive, k3os seems to fall behind. Since i'am using rancherOS for years now i say this with an sad mind, but maybe there is just not enough reason for rancher to push or invest into k3os - fair enough.
UPDATE2:
Be aware, if you use k3s_args
for the agent as given above, you will fail to configure the agent using k3os install
or k3os config
later on, e.g. when the server ip changed (or the token). This would always just introduce the token:
and server_url:
key in /var/lib/rancher/k3os/config.yaml
and since it is overriden in k3s_args
those values are ignored and take no effect. An ugly side effect.
You can work-arround by manipulating the /var/lib/rancher/k3os/config.yaml
by hand and then calling k3os config
UPDATE3:
After a couple more installation experiments i could make the best of it. The assumption, that k3s_args are overriding server_url
and token
by default was wrong - those are still added to k3s_args in addition to what we place there. This even means, that also -agent
ist not needed, since this will be done automatically since we set server_url
So the final agent config which then can also later be modified and reconfigured using k3os install
or k3os config
would be
server
ssh_authorized_keys:
- <redacted>
hostname: k3os.<redacted>
k3os:
modules:
- kvm
- nvme
dns_nameservers:
- 1.1.1.1
ntp_servers:
- 0.us.pool.ntp.org
token: supersecret
agent
ssh_authorized_keys:
- ssh-rsa <redacted>
hostname: k3os-agent.<redacted>
k3os:
modules:
- kvm
- nvme
dns_nameservers:
- 1.1.1.1
ntp_servers:
- 0.us.pool.ntp.org
# we cannot use server_url nor token since we need to override k3s_args which would override / overrule those 2
server_url: https://10.xx.xx.serverip:6443
token: supersecret
k3s_args:
# needed if we use the same hypevisor as the master node
- "--with-node-id"
Hi guys, I'm trying to install a k3os cluster in VBox with 2 virtual machine, 1 server and 1 agent. The first works great, but I can't add the agent. Both Virtual machine has 2 netwrk interfaces:
k3os version v0.9.1 5.0.0-37-generic #40~18.04.1 SMP Wed Jan 15 04:09:29 UTC 2020 x86_64
Steps:
sudo k3os install
, insert default parameters, with token: "myToken"sudo k3os install
, insert default parameters, with option for agent and url "https://when last start in the /var/log/k3s-service.log of server I get:
I0401 13:25:03.838029 2284 log.go:172] http: TLS handshake error from 192.168.56.107:57838: remote error: tls: bad certificate
From host machine kubectl works:
but, unfortunately:
show only the master.
Can you help me? Thanks, Francesco