kairos-io / kairos

The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.15k stars 97 forks source link

Kairos changing Users UID #2949

Open clyra opened 1 month ago

clyra commented 1 month ago

Kairos version:

cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo

cat /etc/kairos-release
KAIROS_VERSION_ID="v3.2.1-11-g83c0aef" KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-core-amd64-generic-v3.2.1-11-g83c0aef" KAIROS_ARTIFACT="kairos-ubuntu-24.04-core-amd64-generic-v3.2.1-11-g83c0aef" KAIROS_REGISTRY_AND_ORG="quay.io/kairos" KAIROS_ID="kairos" KAIROS_ID_LIKE="kairos-core-ubuntu-24.04" KAIROS_HOME_URL="https://github.com/kairos-io/kairos" KAIROS_GITHUB_REPO="kairos-io/kairos" KAIROS_FAMILY="ubuntu" KAIROS_RELEASE="v3.2.1-11-g83c0aef" KAIROS_MODEL="generic" KAIROS_TARGETARCH="amd64" KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues" KAIROS_NAME="kairos-core-ubuntu-24.04" KAIROS_PRETTY_NAME="kairos-core-ubuntu-24.04 v3.2.1-11-g83c0aef" KAIROS_FLAVOR="ubuntu" KAIROS_FLAVOR_RELEASE="24.04" KAIROS_VARIANT="core" KAIROS_SOFTWARE_VERSION_PREFIX="k3s" KAIROS_VERSION="v3.2.1-11-g83c0aef" KAIROS_IMAGE_LABEL="24.04-core-amd64-generic-v3.2.1-11-g83c0aef"

CPU architecture, OS, and Version:

Linux m1 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 21:40:26 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug Kairos changed the UID of the users after a upgrade. I did a automated install using kairos-ubuntu-24.04-core-amd64-generic-v3.2.1.iso and during this is install two users were created: kairos and dnsdevops. After the boot I could verify that the kairos user got the 1001 uid and dnsdevops got the 1002 uid.

I did a upgrade using a custom-image based on "quay.io/kairos/ubuntu:24.04-core-amd64-generic-master" (this custom build adds docker and a few other packages to the base image. Dockerfile below). After rebooting the new image, uids of the two users were changed, so dnsdevops is now 1001 and kairos 1002.

Booting in recovery mode reverts the uids.

To Reproduce Do a install and create two users. Upgrade, reboot, check users id.

Expected behavior Previously I was using version 3.1.3 and this didnt happened.

Logs

Additional context This is the full dockerfile used to build my custom image:

FROM quay.io/kairos/ubuntu:24.04-core-amd64-generic-master

RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc && \
    chmod a+r /etc/apt/keyrings/docker.asc

RUN  echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  tee /etc/apt/sources.list.d/docker.list > /dev/null

RUN apt-get update && apt-get upgrade -y && \
                      apt-get install -y  \
                      apt-transport-https \
                      ca-certificates \
                      ca-certificates \
                      cron         \
                      containerd.io \
                      curl \
                      docker-buildx-plugin \
                      docker-ce \
                      docker-ce-cli \
                      docker-compose-plugin \                     
                      fail2ban \
                      git \
                      gnupg-agent \
                      iptables-persistent \
                      monitoring-plugins-standard \
                      nagios-nrpe-server \
                      net-tools \
                      netfilter-persistent \
                      python3-docker \
                      python3-nagiosplugin \
                      python3-requests \
                      rsyslog \
                      rsyslog-gnutls \
                      snmpd \
                      software-properties-common \
                      sudo \
                      whois  && \
                      apt-get clean

RUN apt-get remove -y --purge apparmor

COPY files/daemon.json /etc/docker/
COPY files/nagios-sudo /etc/sudoers.d/
jimmykarily commented 1 month ago

We sort the users before we create them: https://github.com/mudler/yip/blob/87b55bb1813f1f132a2a0f8242eadd9d702ce5f2/pkg/plugins/user.go#L208

which means, the uids should be deterministic, as long as no new users are created. So we now have 2 issues. The one reported here which we need to reproduce and find out how it happens and another one that just hit me: If I start with 2 users (e.g. "kairos-b" and "kairos-c" and then later I add a third one ('kairos-a"), the uid of the new user would be 1001 (because it comes first when sorting), thus the other 2 users will change ids and their home directories will have broken permissions.

We don't persists the assigned user ids anywhere and for that reason there is no way to assign the same ones when more users are added. So, either we should persist those somehow, or we document that in such scenarios (with multiple users), the ids should be assigned by the cloud config explicitly (which is already supported).

jimmykarily commented 1 month ago

I think this is caused by the removal of the default kairos user: https://github.com/kairos-io/kairos/issues/2921

Previously, we had a system cloud init file that was always creating the kairos user, thus taking the id 1001. When the code that creates the user-defined users run, it found that the next available id is 1002 and by sorting the 2 users "dnsdevops" and "kairos" if first created the "dnsdevops" user assigning id 1002. Then it found that "kairos" already exists and it didn't re-assign an id, thus leaving the id 1001.

On the newer image, the "kairos" user wasn't created automatically, thus the first available uid was 1001, which was again assigned alphabetically first to "dnsdevops" and the kairos user from the user's config got the next one which was "1002".

Recovery is still the old image, that's why it works as expected there.

mauromorales commented 1 month ago

@jimmykarily I'm not sure this is the case. If I create a system only with the kairos user, then it gets assigned 1001 in my system. Then add a dnsdevops user and it receives consistently the 1002 id

kairos:x:1001:1001:Created by entities:/home/kairos:/bin/bash
dnsdevops:x:1002:1002:Created by entities:/home/dnsdevops:/bin/sh

I'm using

KAIROS_VERSION_ID="v3.2.1-v1.31.1-k3s1"
jimmykarily commented 1 month ago

You need to go back and forth between an image that creates the kairos user automatically and one that doesn't @mauromorales

mauromorales commented 1 month ago

@jimmykarily wait but doesn't the report say that the whole thing starts from 3.2.1? that's what I used for testing

jimmykarily commented 1 month ago

the first installation is with 3.2.1 which has the kairos default user created automatically. The upgrade is based on quay.io/kairos/ubuntu:24.04-core-amd64-generic-master which doesn't.

mauromorales commented 1 month ago

@jimmykarily sorry I missed that, I confirm the bug exists when upgrading to master branch

dnsdevops:x:1001:1001:Created by entities:/home/dnsdevops:/bin/sh
kairos:x:1002:1002:Created by entities:/home/kairos:/bin/sh

which means the issue of the ordering will indeed exist, it was only kairos user which used to be extent from it

clyra commented 1 month ago

We don't persists the assigned user ids anywhere and for that reason there is no way to assign the same ones when more users are added. So, either we should persist those somehow, or we document that in such scenarios (with multiple users), the ids should be assigned by the cloud config explicitly (which is already supported).

If one chooses this option (explicit uid set), one should be aware that the cloud-init defaults to create the primary user group as "username" and after the upgrade the user group will be swapped (i.e. uid 1001 gid 1002 !). I'm thinking if the decision to sort the users introduces more problems than the problems it tried to solve :-(.

This what i got by setting kairos to 1001 and dnsdevops to 1002:

dnsdevops:x:1002:1001:Created by entities:/home/dnsdevops:/bin/sh
kairos:x:1001:1002:Created by entities:/home/kairos:/bin/sh
mauromorales commented 1 month ago

@clyra yeah I'm afraid both uid and gid would need to be set

clyra commented 1 month ago

@clyra yeah I'm afraid both uid and gid would need to be set

yes, but not directly :-(. As far as I understood from https://cloudinit.readthedocs.io/en/latest/reference/modules.html, you cannot explicitly set a gid, so you have to create groups and then assign users to the previously created group. If kairos also sort groups we will hit the same problem again...

mauromorales commented 1 month ago

@clyra about that ... we don't really use cloud-init, we use yip which implements a subset of cloud-init plus its own implementation of those https://github.com/mudler/yip

so you should be able to use

name: kairos
passwd: kairos
uid: 1001
gid: 1001

@kairos-io/maintainers maybe we should be clear in the docs and say "yip, a cloud-init subset" or something similar

clyra commented 1 month ago

@mauromorales

Right.... but then i'm kind lost as there's no such options on yip or it's not writen. https://github.com/mudler/yip?tab=readme-ov-file#stagesstageidstepnusers, this should list all options for the user directive, right?

May you elaborate a bit more about the "yip, a cloud-init subset"? I mean, should I only look into the yip documentation or there are topics/options that I should look into the cloud-init documentation?

mauromorales commented 1 month ago

@clyra you need to see yip docs ... but in the meantime I'd advice to not use the master image, instead go with the release one. The team is discussing about a solution for the issue on master

clyra commented 1 month ago

@mauromorales Thanks, I will update my custom image to use a given numbered version base image instead of master (as i should from the beginning :-) ). Should I also go back to 3.1.3 or something else before the sorted users?

jimmykarily commented 1 month ago

@mauromorales Thanks, I will update my custom image to use a given numbered version base image instead of master (as i should from the beginning :-) ). Should I also go back to 3.1.3 or something else before the sorted users?

Latest release is ok

mauromorales commented 1 month ago

@clyra ok cool, yeah as @jimmykarily says you can use the 3.2.1 without issues, just master has the current problem. But thanks for testing it on master and bringing it to our attention. We ended up realizing the bug has been there hidden at plain sight for quite a long time.

jimmykarily commented 3 weeks ago

After discussing it in a call, we think this is a possible solution:

We will persist the /etc/passwd on every boot. Will will use on the next boot to compare the users to be created, with the users that existed.

This way, we can re-use old uid as long as there was no home directory of an old user with that uid. Freeing up uids won't need a re-installation of Kairos then. Only a deletion of all old user home directories.

The rest of the flows are more obvious.

clyra commented 3 weeks ago

@jimmykarily sounds good. May I assume that there will be a similar flow for the group part?

I was (wrongly) assuming that the password files already persisted between boots, but it doesnt, right?

jimmykarily commented 3 weeks ago

@jimmykarily sounds good. May I assume that there will be a similar flow for the group part?

Good question. We may need to persist /etc/group too, to be able to assign the same id to the pre-existing groups so that file permissions match the correct group.

I was (wrongly) assuming that the password files already persisted between boots, but it doesnt, right?

It's not.