Open Akvanvig opened 1 week ago
yep, confirmed.
Built with master, k3s image, set a single node k8s, upgrade with system-upgrade-controller, result sin duplicated lines in /etc/hosts
kairos@kairos-k3s:~$ cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1 localhost kairos-k3s
127.0.0.1 localhost kairos-k3s
Not sure whats going on, lol
can be reproduced by running the initramfs stage several times with kairos-agent run-stage initramfs
Seem like yip is not picking up or checking that the line exists?
using yip directly seems to work though??
ah seems that its the 31_host file from system/oem and its only run in initramfs.before
yes, somehow the check is failing so it recreates the hostname..
Could it be just the check in 31_host? Seems a bit strange that it would add the extra comment as well from the container then as in the example under describe the bug? 🤔 Checked on a cluster that had been upgraded a few times using suc, and ends up with one extra comment and one extra hosts line for each upgrade plus the original one
Went and tested in a vm, and seems like it's like you're saying and the extra 127.0.0.1 localhost
line is simply the 31_hosts adding an extra line.
The extra comment line being added seems to be just Kubernetes mounting the node hosts file into the container and then adding its own comment to the top. (again)
Set up a pod that is about equal to the suc-container and there it is:
root@localhost:~# kubectl exec -it -n system-upgrade suc-busybox-test -- sh
/ # cat /etc/hosts
# Kubernetes-managed hosts file (host network).
# Kubernetes-managed hosts file (host network).
127.0.0.1 localhost
127.0.0.1 localhost
/ # exit
root@localhost:~# cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1 localhost
127.0.0.1 localhost
I guess the second problem could be solved by dropping comments in the hosts file in yip (unless that's something you guys aim to not modify) somewhere in this loop :thinking:
https://github.com/mudler/yip/blob/master/pkg/plugins/hostname.go#L74-L82
If modifying yip is not an option, the suc-upgrade.sh could maybe be modified to either remove comments with sed first or maybe copy the original from /host/etc/hosts
and then do the upgrade here?
https://github.com/kairos-io/packages/blob/main/packages/system/suc-upgrade/suc-upgrade.sh#L39
This patch seems to alleviate it, after 2 upgrades I no longer get the duplicated entries: https://github.com/kairos-io/packages/pull/1113/
I do get duplicated comments though. I still dont get why. If k8s mounts stuff under /etc/hosts from the host into the container, thats ok but the upgrade should just ignore that and copy it. Plus, /etc is ephemeral so after a reboot it should go away?
The only thing I can see touching that file is that yaml file... No idea where the duplication comes from, could it be that the plugin is adding extra lines somehow? But maybe the underlying /etc in the image does have the /etc/hosts duplication ??
Im really confused over this one
Looking at the Kairos-container available in the kairos registries it contains an etc/hosts file, but this is empty so that explains why it is reset once you upgrade with kairos-agent upgrade $image
On the other hand when it's ran as pod in kubernetes, then kubernetes will give it a hosts file based on either cluster-network or node (host-network). This seems to be what is causing problems here, I don't really have a good solution for this though as long as the pod-fs is mounted and used to upgrade. My suggestion about overwriting existing file or copying the file from host doesn't seem to work though based on some testing since I couldn't find a way to delete/overwrite the file from the pods.
Not sure how you could get around this? I assume the upgrade command when provided the --source
just takes the entire OS there and packs it up?
kubernetes doc: https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/
Not sure how you could get around this? I assume the upgrade command when provided the
--source
just takes the entire OS there and packs it up?
Yep, it does. Maybe we should either skip the hosts file or overwrite it on each boot before filling the hostname? so we start from initramfs with a clean hosts file that we know its "clean" on each boot?
mmmh, going into the upgrade container I can see this:
/dev/disk/by-label/COS_PERSISTENT on /etc/hosts type ext4 (rw,relatime)
so its storing the hosts file in the persistent partition. But only on the running container, outside in the host the /etc/hosts is not persistent...
somehow somewhere, with the patch this suddenly seems to be fixed. There is also another patch that may affect this, that changes the config read paths as we were not reading the current system paths for configs (https://github.com/kairos-io/kairos-agent/pull/579)
I could not reproduce it anymore with framework 2.14.1 (latest agent and cloud configs). I need to try it again tomorrow freom a clean image though, but it may have gone away.
Kairos version:
PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo KAIROS_ID_LIKE="kairos-standard-ubuntu-24.04" KAIROS_IMAGE_LABEL="24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1-k3s1" KAIROS_ARTIFACT="kairos-ubuntu-24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1+k3s1" KAIROS_FLAVOR="ubuntu" KAIROS_FLAVOR_RELEASE="24.04" KAIROS_FAMILY="ubuntu" KAIROS_MODEL="generic" KAIROS_NAME="kairos-standard-ubuntu-24.04" KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues" KAIROS_SOFTWARE_VERSION="v1.31.1+k3s1" KAIROS_TARGETARCH="amd64" KAIROS_GITHUB_REPO="kairos-io/kairos" KAIROS_VERSION="v3.2.1-v1.31.1-k3s1" KAIROS_REGISTRY_AND_ORG="quay.io/kairos" KAIROS_HOME_URL="https://github.com/kairos-io/kairos" KAIROS_ID="kairos" KAIROS_PRETTY_NAME="kairos-standard-ubuntu-24.04 v3.2.1-v1.31.1-k3s1" KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1-k3s1" KAIROS_VARIANT="standard" KAIROS_RELEASE="v3.2.1" KAIROS_SOFTWARE_VERSION_PREFIX="k3s" KAIROS_VERSION_ID="v3.2.1-v1.31.1-k3s1"
CPU architecture, OS, and Version:
Linux localhost 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Describe the bug
When upgrading using the routine documented for system-upgrade-controller, the containers /etc/hosts file seemingly gets merged with the hosts file and ends up with more and more entries. This is after two upgrades using suc since provisioning:
To Reproduce
using the plan shown in documentation here, apply and upgrade a cluster. After reboot, check the /etc/hosts file:
Expected behavior
It should work the same way as when upgrading using the
kairos-agent ugrade
command directly, so result in a hostsfile equal to the one we started with:Logs
Not been able to find any logs indicating what's gone wrong here
Additional context
Does not seem to affect upgrades using
kairos-agent upgrade
directly. My guess is that it's related to the suc upgrade using the containers root as a source, but haven't yet found how it could be prevented. Not aware of any other files being affected in a similar way.