techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

Fix cgroups cmdline path #456

Closed yebo29 closed 8 months ago

yebo29 commented 8 months ago

Proposed Changes

When replacing a node, I noticed that the cgroups task has an old path. Old Raspbian had /boot/cmdline.txt, but that file has been moved to /boot/firmware/cmdline.txt. Upon updating that path, configuration completed successfully and k3s service started without error. When I originally built the cluster, I made that change manually. Replacing a node and using the playbook exposed this issue.

References

Checklist

yebo29 commented 8 months ago

Need to add check so that we edit the right file on older Raspbian systems.

timothystewart6 commented 8 months ago

Thank you!

yebo29 commented 8 months ago

@timothystewart6 of course! Although I'm not sure why the molecule test is failing. I tried locally with the file in both locations.

timothystewart6 commented 8 months ago

@yebo29 thank you! Not your fault. The ipv6 test is flaky, not sure of the root cause but I will get this merged in.

pbolduc commented 8 months ago

This worked for me. I think there is a couple of issues that may want to be considered.

1) if someone ran an older version of this playbook on a bookworm enviornment, it would create the /boot/cmdline.txt. Even if the version of the playbook is upgraded, because /boot/cmdline.txt was created by old paybook, the updated playbook would still try to change /boot/cmdline.txt. 1) if someone does a dist upgrade, does the cmdline.txt get moved or copied? is there a chance the old one is still there?

I manually deleted /boot/cmdline.txt before testing. Should the path be set by the distribution detection instead of file existence?

yebo29 commented 8 months ago

This worked for me. I think there is a couple of issues that may want to be considered.

1. if someone ran an older version of this playbook on a bookworm enviornment, it would create the `/boot/cmdline.txt`. Even if the version of the playbook is upgraded, because `/boot/cmdline.txt` was created by old paybook, the updated playbook would still try to change `/boot/cmdline.txt`.

2. if someone does a dist upgrade, does the cmdline.txt get moved or copied? is there a chance the old one is still there?

I manually deleted /boot/cmdline.txt before testing. Should the path be set by the distribution detection instead of file existence?

Those are valid points. An LSB release check would be beneficial here. I think that would resolve both points, since Bookworm will use the new location if it was a new install or an upgrade, I would think. But that would require further digging/research to confirm.

I'm okay with putting this back in draft mode and adding the additional check if we agree on that. I'll let @timothystewart6 provide further input.

EDIT: Typo fix

timothystewart6 commented 8 months ago

@yebo29 that sounds good to me, the safer the better. No rush at all to get this merged in!

yebo29 commented 8 months ago

Converted back to draft mode. Will push new changes when ready!

yebo29 commented 8 months ago

Added commit, but I still need to test. Due to time, I will test tomorrow.

pbolduc commented 8 months ago

I just tested on my Pi cluster and it successfully deployed when a /boot/cmdline.txt exists. The /boot/firmware/cmdline.txt was updated correctly.

yebo29 commented 8 months ago

I just tested on my Pi cluster and it successfully deployed when a /boot/cmdline.txt exists. The /boot/firmware/cmdline.txt was updated correctly.

Thank you for testing! I've had a jam-packed morning so just getting to this. Going to do my own round of testing and then mark as ready when I'm done and if there are no issues.

yebo29 commented 8 months ago

Ah, I also have a lint error. Forgot to run pre-commit. Line too long. I'll look into breaking it into multiple lines.

yebo29 commented 8 months ago

Pushed new changes. Was able to test locally. Marked as ready for review again.

Idea for another PR and/or issue: Add tags to tasks to speed up testing: ansible-playbook site.yml -i inventory/hosts.ini --tags rpi,k3s_server_post

yebo29 commented 8 months ago

Darn IPv6 molecule test failed again. Anything I can do to help there?

pbolduc commented 8 months ago

I just checked a new install on a fresh install of dietpi and they have not moved the cmdline.txt. Diet Pi probably should move it, however, I wonder if the detection should use /boot/firmware/cmdline.txt if it is found, otherwise fall back to /boot/cmdline.txt.

Raspberry Pi OS

pico-k3s@worker-8:~ $ find /boot -name cmdline.txt
/boot/cmdline.txt
/boot/firmware/cmdline.txt
pico-k3s@worker-8:~ $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Diet PI

root@control-1:~# find /boot -name cmdline.txt
/boot/cmdline.txt
root@control-1:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
yebo29 commented 8 months ago

I just checked a new install on a fresh install of dietpi and they have not moved the cmdline.txt. Diet Pi probably should move it, however, I wonder if the detection should use /boot/firmware/cmdline.txt if it is found, otherwise fall back to /boot/cmdline.txt.

Raspberry Pi OS

pico-k3s@worker-8:~ $ find /boot -name cmdline.txt
/boot/cmdline.txt
/boot/firmware/cmdline.txt
pico-k3s@worker-8:~ $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Diet PI

root@control-1:~# find /boot -name cmdline.txt
/boot/cmdline.txt
root@control-1:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Did you try to run these code changes against that OS and observe what happens?

pbolduc commented 8 months ago

Did you try to run these code changes against that OS and observe what happens?

Not yet. I will try this evening (PST) after work.

pbolduc commented 8 months ago

I just re-imaged Pi OS to one Pi 4. I can see both /boot/cmdline.txt and /boot/firmware/cmdline.txt. The files are identical,

pico-k3s@worker-1:~ $ cat /boot/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=ff374c2a-02 rootfstype=ext4 fsck.repair=yes rootwait
pico-k3s@worker-1:~ $ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=ff374c2a-02 rootfstype=ext4 fsck.repair=yes rootwait

So, I am wondering if the existing check by grepping the file for console=|rootfstype is the best approach. I only know ansible enough to read it. I could try taking a stab at a fix. In my head the logic would be

if debian bookworm
  if exists /boot/firmware/cmdline.txt then boot_cmdline_path = /boot/firmware/cmdline.txt
  else if exists /boot/cmdline.txt then boot_cmdline_path = /boot/cmdline.txt

Thoughts?

yebo29 commented 8 months ago

I just re-imaged Pi OS to one Pi 4. I can see both /boot/cmdline.txt and /boot/firmware/cmdline.txt. The files are identical,

pico-k3s@worker-1:~ $ cat /boot/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=ff374c2a-02 rootfstype=ext4 fsck.repair=yes rootwait
pico-k3s@worker-1:~ $ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=ff374c2a-02 rootfstype=ext4 fsck.repair=yes rootwait

So, I am wondering if the existing check by grepping the file for console=|rootfstype is the best approach. I only know ansible enough to read it. I could try taking a stab at a fix. In my head the logic would be

if debian bookworm
  if exists /boot/firmware/cmdline.txt then boot_cmdline_path = /boot/firmware/cmdline.txt
  else if exists /boot/cmdline.txt then boot_cmdline_path = /boot/cmdline.txt

Thoughts?

Sure, we can flip the logic around. It makes sense to me.

yebo29 commented 8 months ago

Done. I changed the test to stat, flipped the logic around, and updated the following task to suit. I tested locally successfully, and also changed the match for bookworm to nonsense to make sure it skips based on that as well. Feel free to also verify at your earliest convenience, and I thank you for all your help @pbolduc !

pbolduc commented 8 months ago

@yebo29 Thank you for actually implementing the changes. I have to purchase Jeff's Ansible for DevOps and actually learn to write ansible. I'll have time tomorrow night to reimage my pi's USB drives to start clean. I will try with latest Raspberry Pi OS and Diet Pi and report my results. (I'm setting up 9 node - 1 control, 8 worker node k3s for testing at home)

pbolduc commented 8 months ago

Hit a snag with diet pi and I think this PR should be tested / merged separately. The snag I ran into is that diet pi is a minimal distro. Diet Pi does not install lsb-release by default. Without this package installed, the check ansible_facts.lsb.description|default("") is match(allowed_descriptions | join('|')) never matches. Running ansible 192.168.1.150 -m ansible.builtin.gather_facts --tree diet-pi-facts.txt shows the lsb block is empty.

"ansible_lsb": {},

With my limited ansible experience, I tried to add some tasks to ensure this was installed but failed to get it working as expected. This is what I tried to add near the beginning of roles/raspberrypi/tasks/main.yml. I think it best to create a diet pi "up for grabs" issue and if someone (or me when I have time) in the community would like to help, they can submit a future PR.

# some distro like diet pi do not install lsb-release by default
- name: Install lsb-release
  apt:
    name: lsb-release
    state: present
  register: lsb_release
  when: ansible_facts.lsb.description is not defined

# if lsb-release was installed, re-gather the lsb facts
- name: Gather lsb if lsb_release installed
  setup:
    gather_subset:
      - "lsb"
  when: lsb_release is changed

Even without lsb-release, Diet Pi does have the distribution properties,

        "ansible_distribution": "Debian",
        "ansible_distribution_file_parsed": true,
        "ansible_distribution_file_path": "/etc/os-release",
        "ansible_distribution_file_variety": "Debian",
        "ansible_distribution_major_version": "12",
        "ansible_distribution_minor_version": "5",
        "ansible_distribution_release": "bookworm",
        "ansible_distribution_version": "12.5",
pbolduc commented 8 months ago

Added https://github.com/techno-tim/k3s-ansible/issues/463 to track diet pi compatibility.

timothystewart6 commented 8 months ago

Thank you all!