kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.11k stars 97 forks source link

When upgrading through suc the hosts file get extra entries from the pod #2934

Open Akvanvig opened 1 week ago

Akvanvig commented 1 week ago

Kairos version:

PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo KAIROS_ID_LIKE="kairos-standard-ubuntu-24.04" KAIROS_IMAGE_LABEL="24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1-k3s1" KAIROS_ARTIFACT="kairos-ubuntu-24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1+k3s1" KAIROS_FLAVOR="ubuntu" KAIROS_FLAVOR_RELEASE="24.04" KAIROS_FAMILY="ubuntu" KAIROS_MODEL="generic" KAIROS_NAME="kairos-standard-ubuntu-24.04" KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues" KAIROS_SOFTWARE_VERSION="v1.31.1+k3s1" KAIROS_TARGETARCH="amd64" KAIROS_GITHUB_REPO="kairos-io/kairos" KAIROS_VERSION="v3.2.1-v1.31.1-k3s1" KAIROS_REGISTRY_AND_ORG="quay.io/kairos" KAIROS_HOME_URL="https://github.com/kairos-io/kairos" KAIROS_ID="kairos" KAIROS_PRETTY_NAME="kairos-standard-ubuntu-24.04 v3.2.1-v1.31.1-k3s1" KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1-k3s1" KAIROS_VARIANT="standard" KAIROS_RELEASE="v3.2.1" KAIROS_SOFTWARE_VERSION_PREFIX="k3s" KAIROS_VERSION_ID="v3.2.1-v1.31.1-k3s1"

CPU architecture, OS, and Version:

Linux localhost 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

When upgrading using the routine documented for system-upgrade-controller, the containers /etc/hosts file seemingly gets merged with the hosts file and ends up with more and more entries. This is after two upgrades using suc since provisioning:

root@localhost:~# cat /etc/hosts
# Kubernetes-managed hosts file (host network).
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost
127.0.0.1       localhost
root@localhost:~#

To Reproduce

using the plan shown in documentation here, apply and upgrade a cluster. After reboot, check the /etc/hosts file:

root@localhost:~# cat /etc/hosts
127.0.0.1       localhost
root@localhost:~# kubectl apply -f upgrade.yaml 
plan.upgrade.cattle.io/os-upgrade configured
root@localhost:~# 
Broadcast message from root@localhost (Thu 2024-10-10 14:17:44 UTC):

The system will reboot now!
Connection to 192.168.122.63 closed.
$ ssh 192.168.122.63
$ cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost
$

Expected behavior

It should work the same way as when upgrading using the kairos-agent ugrade command directly, so result in a hostsfile equal to the one we started with:

root@localhost:~# cat /etc/hosts
127.0.0.1       localhost
root@localhost:~#

Logs

Not been able to find any logs indicating what's gone wrong here

Additional context

Does not seem to affect upgrades using kairos-agent upgrade directly. My guess is that it's related to the suc upgrade using the containers root as a source, but haven't yet found how it could be prevented. Not aware of any other files being affected in a similar way.

Itxaka commented 5 days ago

yep, confirmed.

Built with master, k3s image, set a single node k8s, upgrade with system-upgrade-controller, result sin duplicated lines in /etc/hosts

kairos@kairos-k3s:~$ cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1 localhost kairos-k3s
127.0.0.1 localhost kairos-k3s

Not sure whats going on, lol

Itxaka commented 5 days ago

can be reproduced by running the initramfs stage several times with kairos-agent run-stage initramfs

Seem like yip is not picking up or checking that the line exists?

Itxaka commented 5 days ago

using yip directly seems to work though??

Itxaka commented 5 days ago

ah seems that its the 31_host file from system/oem and its only run in initramfs.before

Itxaka commented 5 days ago

yes, somehow the check is failing so it recreates the hostname..

Akvanvig commented 5 days ago

Could it be just the check in 31_host? Seems a bit strange that it would add the extra comment as well from the container then as in the example under describe the bug? 🤔 Checked on a cluster that had been upgraded a few times using suc, and ends up with one extra comment and one extra hosts line for each upgrade plus the original one

Akvanvig commented 5 days ago

Went and tested in a vm, and seems like it's like you're saying and the extra 127.0.0.1 localhost line is simply the 31_hosts adding an extra line. The extra comment line being added seems to be just Kubernetes mounting the node hosts file into the container and then adding its own comment to the top. (again)

Set up a pod that is about equal to the suc-container and there it is:

root@localhost:~# kubectl exec -it -n system-upgrade suc-busybox-test -- sh
/ # cat /etc/hosts 
# Kubernetes-managed hosts file (host network).
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost
/ # exit

root@localhost:~# cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost

I guess the second problem could be solved by dropping comments in the hosts file in yip (unless that's something you guys aim to not modify) somewhere in this loop :thinking: https://github.com/mudler/yip/blob/master/pkg/plugins/hostname.go#L74-L82 If modifying yip is not an option, the suc-upgrade.sh could maybe be modified to either remove comments with sed first or maybe copy the original from /host/etc/hosts and then do the upgrade here? https://github.com/kairos-io/packages/blob/main/packages/system/suc-upgrade/suc-upgrade.sh#L39

Itxaka commented 4 days ago

This patch seems to alleviate it, after 2 upgrades I no longer get the duplicated entries: https://github.com/kairos-io/packages/pull/1113/

I do get duplicated comments though. I still dont get why. If k8s mounts stuff under /etc/hosts from the host into the container, thats ok but the upgrade should just ignore that and copy it. Plus, /etc is ephemeral so after a reboot it should go away?

The only thing I can see touching that file is that yaml file... No idea where the duplication comes from, could it be that the plugin is adding extra lines somehow? But maybe the underlying /etc in the image does have the /etc/hosts duplication ??

Im really confused over this one

Akvanvig commented 4 days ago

Looking at the Kairos-container available in the kairos registries it contains an etc/hosts file, but this is empty so that explains why it is reset once you upgrade with kairos-agent upgrade $image

On the other hand when it's ran as pod in kubernetes, then kubernetes will give it a hosts file based on either cluster-network or node (host-network). This seems to be what is causing problems here, I don't really have a good solution for this though as long as the pod-fs is mounted and used to upgrade. My suggestion about overwriting existing file or copying the file from host doesn't seem to work though based on some testing since I couldn't find a way to delete/overwrite the file from the pods.

Not sure how you could get around this? I assume the upgrade command when provided the --source just takes the entire OS there and packs it up?

kubernetes doc: https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/

Itxaka commented 3 days ago

Not sure how you could get around this? I assume the upgrade command when provided the --source just takes the entire OS there and packs it up?

Yep, it does. Maybe we should either skip the hosts file or overwrite it on each boot before filling the hostname? so we start from initramfs with a clean hosts file that we know its "clean" on each boot?

Itxaka commented 3 days ago

mmmh, going into the upgrade container I can see this:

/dev/disk/by-label/COS_PERSISTENT on /etc/hosts type ext4 (rw,relatime)

so its storing the hosts file in the persistent partition. But only on the running container, outside in the host the /etc/hosts is not persistent...

Itxaka commented 3 days ago

somehow somewhere, with the patch this suddenly seems to be fixed. There is also another patch that may affect this, that changes the config read paths as we were not reading the current system paths for configs (https://github.com/kairos-io/kairos-agent/pull/579)

I could not reproduce it anymore with framework 2.14.1 (latest agent and cloud configs). I need to try it again tomorrow freom a clean image though, but it may have gone away.