kairos-io / kairos

The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.14k stars 97 forks source link

k3s network (coredns) problems with v2.2.1-k3sv1.26.4-k3s1 #1525

Closed Ognian closed 1 year ago

Ognian commented 1 year ago

Kairos version:

NAME="openSUSE Leap" VERSION="15.4" ID="opensuse-leap" ID_LIKE="suse opensuse" VERSION_ID="15.4" PRETTY_NAME="openSUSE Leap 15.4" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:opensuse:leap:15.4" BUG_REPORT_URL="https://bugs.opensuse.org" HOME_URL="https://www.opensuse.org/" DOCUMENTATION_URL="https://en.opensuse.org/Portal:Leap" LOGO="distributor-logo-Leap" KAIROS_NAME="kairos-opensuse-leap" KAIROS_VERSION="v2.2.1-k3sv1.26.4+k3s1" KAIROS_ID="kairos" KAIROS_ID_LIKE="kairos-opensuse-leap" KAIROS_VERSION_ID="v2.2.1-k3sv1.26.4+k3s1" KAIROS_PRETTY_NAME="kairos-opensuse-leap v2.2.1-k3sv1.26.4+k3s1" KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues/new/choose" KAIROS_HOME_URL="https://github.com/kairos-io/provider-kairos" KAIROS_IMAGE_REPO="quay.io/kairos/kairos-opensuse-leap" KAIROS_IMAGE_LABEL="latest" KAIROS_GITHUB_REPO="kairos-io/provider-kairos" KAIROS_VARIANT="core" KAIROS_FLAVOR="opensuse-leap" CPU architecture, OS, and Version:

Linux lp3dnode 5.14.21-150400.24.63-default #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f) x86_64 x86_64 x86_64 GNU/Linux Describe the bug

coredns can't start:

 [WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: connect: no route to host 

To Reproduce

upgraded from a previous working version

maybe on a new install it will also reproduce... Expected behavior

Logs

Additional context

Maybe related to this: https://github.com/k3s-io/k3s/issues/7013 but this https://docs.k3s.io/advanced#old-iptables-versions did not help or I did not manage to apply....

jimmykarily commented 1 year ago

I tried a fresh install:

k3s: enabled: true

users:

and it works as expected. coredns Pod comes up just fine. No errors are reported in journalctl -f -u k3s There is a warning printed in the coredns pod but it doesn't seem to break anything:

[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server

I can try an upgrade. @Ognian which version did you upgrade from?

jimmykarily commented 1 year ago

10.43.0.1 is where the kubernetes api service should be running. No route to host means it can't be reached. Either it's not running at all or something is blocking access to it. @Ognian if you have ssh access to the box, try to run sudo journalctl -u k3s and check for errors. Try to run k9s as root as well to see if you can access the kubernetes cluster in general.

Ognian commented 1 year ago

@jimmykarily k9s is working, this is where I have the info about coredns from; sudo journalctl -f -u k3s shows a lot of errors, I noticed that i.e. this one:

Jun 21 13:02:07 lp3dnode k3s[1544]: E0621 13:02:07.627507    1544 kubelet_node_status.go:701] "Failed to set some node status fields" err="failed to validate nodeIP: node IP: \"192.168.6.64\" not found in the host's network interfaces" node="lp3dnode"
Jun 21 13:02:07 lp3dnode k3s[1544]: time="2023-06-21T13:02:07Z" level=info msg="Connecting to proxy" url="wss://192.168.6.64:6443/v1-k3s/connect"

is trying to connect to 192.168.6.64 BUT ip ashows that 192.168.6.54 is the correct ip... May be I should delete /usr/local/.kairos/lease and reboot?

Ognian commented 1 year ago

@jimmykarily Ups there is no /usr/local/.kairos/lease and I was upgrading, actually from 1.6 to 2.0, 2.1, 2.2.1 if I remember correctly...

Ognian commented 1 year ago
image

so there is a wrong ip the internal ip is 192.168.6.54 and not 64 How can this be changed?

jimmykarily commented 1 year ago

The IP address change may be related to this: https://github.com/kairos-io/kairos/issues/1381 which got fixed and then got fixed again.

If that's the problem then either you wait until today's release is out (which has the fix) or you try to change the IP address manually to what it was (I see some options here but I'm not sure how to do that on a read-only OS: https://github.com/k3s-io/k3s/issues/5787). Also, the fix is based on us storing the machine-id so that we can restore it. I'm not sure if upgrading to this version will find the old machine-id stored. @Itxaka since you implemented the fix, do you think it's going to work? Probably the broken versions should be skipped (so that the IP address doesn't change with the reboots).

Another thing, is it needed that all upgrades are applied? Can't @Ognian just upgrade from 1.6 to the latest in one step?

jimmykarily commented 1 year ago

Also, if this is indeed the issue, we should add it to the release notes (e.g. which versions to skip to avoid this problem). I assume, for those who used static IP addresses, it's not a problem.

Itxaka commented 1 year ago

The IP address change may be related to this: #1381 which got fixed and then got fixed again.

If that's the problem then either you wait until today's release is out (which has the fix) or you try to change the IP address manually to what it was (I see some options here but I'm not sure how to do that on a read-only OS: k3s-io/k3s#5787). Also, the fix is based on us storing the machine-id so that we can restore it. I'm not sure if upgrading to this version will find the old machine-id stored. @Itxaka since you implemented the fix, do you think it's going to work? Probably the broken versions should be skipped (so that the IP address doesn't change with the reboots).

Another thing, is it needed that all upgrades are applied? Can't @Ognian just upgrade from 1.6 to the latest in one step?

It should be possible I think. The original machine-id should still be under /usr/local/etc/machine-id so @Ognian make sure to save it somewhere.

Then once upgraded to the fixed version, it should auto pick teh exiting machine-id and save it under /etc/machine-id which should give you the fixed IP back...I think.

But after upgrade, for sure it wont change again.

If a fixed IP is required it also should be possible to skip dhcp and set it manually via https://kairos.io/docs/advanced/networking/#static-ip meanwhile

Ognian commented 1 year ago

this did not work. I had to do a reset... Now there is again a problem with the rpi version, where there is no network... I know you are working very hard on this, but somehow for me the last working version was 1.6.1 ...

Ognian commented 1 year ago

I did some additional test today: somehow after 2 -3 reboots there was a network again so that I could access it via ssh... somehow it is stuck here:

sudo journalctl -fu kairos-agent

Jun 25 10:06:13 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687573.5144079,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5247607,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5250604,"caller":"service/node.go:342","msg":"Active: '12D3KooWR7t7u5f6jLDBFittv2chNfqfUVC86Cwa5Af6RBcHdRJg'"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.525156,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5252457,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.541726,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWR7t7u5f6jLDBFittv2chNfqfUVC86Cwa5Af6RBcHdRJg]"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5419612,"caller":"role/auto.go:32","msg":"Advertizing nodes:[01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode 95980d83ea1d85d91dfda7ec642eb452-rpi4node]"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5431929,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5433888,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5434966,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 25 10:06:23 rpi4node kairos-agent[1488]: {"level":"info","ts":1687687583.5449197,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}

any ideas?

jimmykarily commented 1 year ago

@Ognian these logs don't seem to be the cause of the problem. There just say that edgevpn is working fine. When you say "there is no network" I assume you mean you can't SSH. This could be cause by various problems not related to networking. E.g. even a syntax error in the cloud config could break SSH (because user accounts won't be created).

Eventually you SSHed to the machine so I assume it worked.

Regarding the original problem, does k3s now start? Does it use the correct IP address? If you can also attach the output of sudo journalctl -u k3s and sudo journalctl -u kairos-agent it might help (without -f, so that we see all the logs).

Ognian commented 1 year ago

@jimmykarily sorry for not being precise enough. When I say "there is no network" I mean the rpi does not have an ip address (on finishing boot you display the kairos in ascii art and below eth0: and wlan0: and the corresponding IP is printed). In my case the IP address is NOT displayed. I think that the network stack is simply not started...

So what I wanted to say is that only after 3 reboots I was able to get an IP and than ssh into the machine was without any problem.

Here are the logs:

sudo journalctl -u k3s
-- No entries --
 sudo journalctl -u kairos-agent
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.847147,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.8473697,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.847446,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.84752,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.8605218,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.86077,"caller":"role/auto.go:32","msg":"Advertizing nodes:[95980d83ea1d85d91dfda7ec642eb452-rpi4node 01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode]"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.8618371,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.8620024,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.862083,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:05:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773922.8632998,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.870168,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8704176,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.870581,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8707116,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.886055,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8862555,"caller":"role/auto.go:32","msg":"Advertizing nodes:[95980d83ea1d85d91dfda7ec642eb452-rpi4node 01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode]"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8874788,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8877306,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8878314,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:05:32 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773932.8890953,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.8945184,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.8947456,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.8948195,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.8948863,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.9069264,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.907052,"caller":"role/auto.go:32","msg":"Advertizing nodes:[95980d83ea1d85d91dfda7ec642eb452-rpi4node 01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode]"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.9077408,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.9078684,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.907907,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:05:42 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773942.9087343,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.915197,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.915463,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.915564,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.9156604,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.9315953,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.931739,"caller":"role/auto.go:32","msg":"Advertizing nodes:[01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode 95980d83ea1d85d91dfda7ec642eb452-rpi4node]"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.9324615,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.9325628,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.9326096,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:05:52 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773952.9334261,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9387913,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9390538,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9391537,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9392502,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.955402,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9555666,"caller":"role/auto.go:32","msg":"Advertizing nodes:[95980d83ea1d85d91dfda7ec642eb452-rpi4node 01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode]"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9562304,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.956341,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9563813,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:06:02 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773962.9571633,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9640732,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9643428,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9644635,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9645624,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9804175,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9805548,"caller":"role/auto.go:32","msg":"Advertizing nodes:[01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode 95980d83ea1d85d91dfda7ec642eb452-rpi4node]"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.981206,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.981313,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.981352,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:06:12 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773972.9821146,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:06:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773982.9875455,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:06:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773982.9878795,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:06:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773982.9879801,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:06:22 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773982.9880831,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:06:23 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773983.0047708,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:06:23 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773983.0048954,"caller":"role/auto.go:32","msg":"Advertizing nodes:[01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode 95980d83ea1d85d91dfda7ec642eb452-rpi4node]"}
Jun 26 10:06:23 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773983.005551,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:06:23 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773983.0056956,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:06:23 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773983.0057354,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:06:23 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773983.0065,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.01152,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0119143,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0120232,"caller":"service/node.go:308","msg":"Applying role 'auto'"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.012119,"caller":"service/role.go:115","msg":"Role loaded. Applying auto"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0360918,"caller":"role/auto.go:31","msg":"Active nodes:[12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ 12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77]"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0362587,"caller":"role/auto.go:32","msg":"Advertizing nodes:[95980d83ea1d85d91dfda7ec642eb452-rpi4node 01200b4e5d2bd9b6ceca9ff86495cef0-lp3dnode]"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0369692,"caller":"role/auto.go:62","msg":"Backing off, as we are not currently flagged as leader"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0370865,"caller":"service/node.go:308","msg":"Applying role 'worker'"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0371273,"caller":"service/role.go:115","msg":"Role loaded. Applying worker"}
Jun 26 10:06:33 rpi4node kairos-agent[1591]: {"level":"info","ts":1687773993.0379844,"caller":"p2p/worker.go:29","msg":"Node already configured, backing off"}
Jun 26 10:06:43 rpi4node kairos-agent[1591]: {"level":"info","ts":1687774003.0421433,"caller":"service/node.go:342","msg":"Active: '12D3KooWKup38XSdKmnymAi4AAJJugSzSmGxHxnahTcNkh4BTuEZ'"}
Jun 26 10:06:43 rpi4node kairos-agent[1591]: {"level":"info","ts":1687774003.0423815,"caller":"service/node.go:342","msg":"Active: '12D3KooWSq8TVE8FJmS4gguuvW7VmcqT9NXigKQnefukeTQUPQ77'"}
lines 1-82
jimmykarily commented 1 year ago

@Ognian this seems to have gotten the worker role. Can you also send the output of:

Ognian commented 1 year ago

sudo journalctl -u k3s-agent:

Jun 26 12:57:58 rpi4node systemd[1]: Failed to start Lightweight Kubernetes.
Jun 26 12:58:04 rpi4node systemd[1]: k3s-agent.service: Scheduled restart job, restart counter is at 975.
Jun 26 12:58:04 rpi4node systemd[1]: Stopped Lightweight Kubernetes.
Jun 26 12:58:04 rpi4node systemd[1]: Starting Lightweight Kubernetes...
Jun 26 12:58:04 rpi4node sh[11614]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jun 26 12:58:04 rpi4node sh[11615]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jun 26 12:58:04 rpi4node k3s[11618]: time="2023-06-26T12:58:04Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Jun 26 12:58:04 rpi4node k3s[11618]: time="2023-06-26T12:58:04Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/58cbe03d7ea7167b3921ba5332233a89c43d76fc99dbc8e0d6fdc5dcec19ee9d"
Jun 26 12:58:04 rpi4node k3s[11618]: time="2023-06-26T12:58:04Z" level=info msg="error extracting tarball into /var/lib/rancher/k3s/data/58cbe03d7ea7167b3921ba5332233a89c43d76fc99dbc8e0d6fdc5dcec19ee9d-tmp after 5 files, 1 dirs, 370.5266ms: error writing to /var/lib/ranc>
Jun 26 12:58:04 rpi4node k3s[11618]: time="2023-06-26T12:58:04Z" level=fatal msg="extracting data: error writing to /var/lib/rancher/k3s/data/58cbe03d7ea7167b3921ba5332233a89c43d76fc99dbc8e0d6fdc5dcec19ee9d-tmp/bin/k3s: write /var/lib/rancher/k3s/data/58cbe03d7ea7167b392>
Jun 26 12:58:04 rpi4node systemd[1]: k3s-agent.service: Main process exited, code=exited, status=1/FAILURE
Jun 26 12:58:04 rpi4node systemd[1]: k3s-agent.service: Failed with result 'exit-code'.
Jun 26 12:58:04 rpi4node systemd[1]: Failed to start Lightweight Kubernetes.

the actual error when you scroll right is no space left on device

 sudo systemctl status k3s-agent
● k3s-agent.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/k3s-agent.service.d
             └─override.conf
     Active: activating (auto-restart) (Result: exit-code) since Mon 2023-06-26 13:02:00 UTC; 4s ago
       Docs: https://k3s.io
    Process: 12046 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
    Process: 12048 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 12049 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 12050 ExecStart=/usr/bin/k3s agent --with-node-id --node-ip 192.168.6.27 (code=exited, status=1/FAILURE)
   Main PID: 12050 (code=exited, status=1/FAILURE)
sudo systemctl status 
● rpi4node
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Thu 2023-04-06 12:00:25 UTC; 2 months 20 days ago
   CGroup: /
           ├─user.slice 
           │ └─user-0.slice 
           │   ├─session-1.scope 
           │   │ ├─ 12180 sudo systemctl status
           │   │ ├─ 12190 systemctl status
           │   │ └─ 12191 less
           │   └─user@0.service 
           │     └─init.scope 
           │       ├─ 12182 /usr/lib/systemd/systemd --user
           │       └─ 12183 (sd-pam)
           ├─init.scope 
           │ └─ 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 32
           └─system.slice 
             ├─fail2ban.service 
             │ └─ 1417 /usr/bin/python3 /usr/bin/fail2ban-server -xf start
             ├─systemd-networkd.service 
             │ └─ 1431 /usr/lib/systemd/systemd-networkd
             ├─systemd-udevd.service 
             │ └─ 1195 /usr/lib/systemd/systemd-udevd
             ├─rng-tools.service 
             │ └─ 1387 /usr/sbin/rngd --foreground --fill-watermark=3700
             ├─cron.service 
             │ └─ 1321 /usr/sbin/cron -n
             ├─system-serial\x2dgetty.slice 
             │ └─serial-getty@ttyS0.service 
             │   └─ 1515 /sbin/agetty -o -p -- \u --keep-baud 115200,57600,38400,9600 ttyS0 vt220
             ├─edgevpn.service 
             │ └─ 1346 edgevpn api --enable-healthchecks
             ├─systemd-journald.service 
             │ └─ 1175 /usr/lib/systemd/systemd-journald
             ├─sshd.service 
             │ ├─ 1432 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
             │ ├─ 2508 sshd: kairos [priv]
             │ ├─ 2510 sshd: kairos@pts/0
             │ └─ 2511 -bash
             ├─nohang.service 
             │ └─ 1372 python3 /usr/bin/nohang --config /etc/nohang/nohang.conf --monitor
             ├─systemd-resolved.service 
             │ └─ 1467 /usr/lib/systemd/systemd-resolved
             ├─kairos-agent.service 
             │ ├─ 1595 /usr/bin/kairos-agent start
             │ └─ 1606 /system/providers/agent-provider-kairos agent.bootstrap
             ├─iscsid.service 
             │ └─ 1488 /sbin/iscsid -f
             ├─dbus.service 
             │ └─ 1323 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
             ├─systemd-timesyncd.service 
             │ └─ 1309 /usr/lib/systemd/systemd-timesyncd
             ├─system-getty.slice 
             │ └─getty@tty1.service 
             │   └─ 1513 /sbin/agetty -o -p -- \u --noclear tty1 linux
             └─systemd-logind.service 
               └─ 1401 /usr/lib/systemd/systemd-logind
kairos@rpi4node:~> 
df -h
Filesystem                         Size  Used Avail Use% Mounted on
devtmpfs                           4.0M     0  4.0M   0% /dev
tmpfs                              3.9G  8.0K  3.9G   1% /dev/shm
tmpfs                              1.6G  8.9M  1.6G   1% /run
tmpfs                              4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/loop0                         2.0G  1.6G  323M  83% /
/dev/disk/by-label/COS_OEM          55M   19K   51M   1% /oem
tmpfs                              2.0G   12M  2.0G   1% /run/overlay
/dev/disk/by-label/COS_PERSISTENT   55M   23M   28M  45% /usr/local
overlay                            2.0G   12M  2.0G   1% /var
overlay                            2.0G   12M  2.0G   1% /etc
overlay                            2.0G   12M  2.0G   1% /srv
tmpfs                              3.9G  4.0K  3.9G   1% /tmp
/dev/sda2                          5.9G  3.1G  2.6G  55% /run/initramfs/cos-state
sudo fdisk -l
Disk /dev/sda: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Portable SSD T5 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot    Start      End  Sectors  Size Id Type
/dev/sda1           8192   204799   196608   96M  c W95 FAT32 (LBA)
/dev/sda2         204800 12902399 12697600  6.1G 83 Linux
/dev/sda3       12902400 21635071  8732672  4.2G 8e Linux LVM
/dev/sda4       21635072 21766143   131072   64M 83 Linux

Disk /dev/mapper/KairosVG-oem: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes

Disk /dev/mapper/KairosVG-recovery: 4.1 GiB, 4399824896 bytes, 8593408 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes

Disk /dev/loop0: 1.95 GiB, 2097152000 bytes, 4096000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
kairos@rpi4node:~> 

I would say that the state partition is no more expanded after first boot (https://github.com/kairos-io/kairos/issues/1448) What do you think?

jimmykarily commented 1 year ago

I could be. I can't tell which partition is which in kairos from the output above (This command will make it clearer lsblk --list /dev/disk/by-path/* -o NAME,PKNAME,PATH,FSTYPE,MOUNTPOINT,SIZE,RO,LABEL).

I think the only partition that would automatically get expanded would be COS_PERSISTENT, the last one. But I see /dev/sda4 in your output being the last partition and only 64M in size, which doesn't look like it expanded to me.

You probably hit the bug you linked but I'm not 100% sure that this is the only problem you are facing. That bug has a fix which I'm trying to test here: https://github.com/kairos-io/kairos/pull/1497/files but I'm facing CI issues since last week and haven't been able to verify it's ok to merge. Stay tuned.

Ognian commented 1 year ago
 lsblk --list /dev/disk/by-path/* -o NAME,PKNAME,PATH,FSTYPE,MOUNTPOINT,SIZE,RO,LABEL
NAME              PKNAME PATH                          FSTYPE      MOUNTPOINT                 SIZE RO LABEL
sda                      /dev/sda                                                           465.8G  0 
sda1              sda    /dev/sda1                     vfat                                    96M  0 COS_GRUB
sda2              sda    /dev/sda2                     ext4        /run/initramfs/cos-state   6.1G  0 COS_STATE
sda3              sda    /dev/sda3                     LVM2_member                            4.2G  0 
sda4              sda    /dev/sda4                     ext4        /usr/local                  64M  0 COS_PERSISTENT
KairosVG-oem      sda3   /dev/mapper/KairosVG-oem      ext4        /oem                        64M  0 COS_OEM
KairosVG-recovery sda3   /dev/mapper/KairosVG-recovery ext4                                   4.1G  0 COS_RECOVERY
kairos@rpi4node:~> 
jimmykarily commented 1 year ago

Yes, /var/lib/rancher is mounted no COS_PERSISTENT and that one isn't expanded thus it runs our of disk. #1448 is the issue you are facing. Until this is fixed and released, you can try the workaround at the description of that issue. Just use kairos-agent run-stage instead of elemental run-stage which is now deprecated.

mudler commented 1 year ago

The original issue (https://github.com/kairos-io/kairos/pull/1531) was fixed and will be part of the next release. Let's track https://github.com/kairos-io/kairos/issues/1448 separately. Closing this one.