scottmuc / infrastructure

Documentation / Automation for personal third-party infrastructure
The Unlicense
10 stars 2 forks source link

Rebuild Raspbery PI - Fall 2023 Edition #65

Closed scottmuc closed 9 months ago

scottmuc commented 9 months ago

Yay for Repaving!

As much as possible is documented inline in this issue template. In case of problems you may find help by viewing all the previous repave issues. Have fun!

Things to do with the existing build

Post OS install steps on desktop

How Do I Know I Am Done?

scottmuc commented 9 months ago

This repave became a bit forced because I let the certificates expire! I was sick and had a couple trips. I was thinking about how to record this one and that punted the repave further. For now, I'm just going to do it and think about how to record the next one.

scottmuc commented 9 months ago

Interesting to note that the previous OS was bullseye (https://github.com/scottmuc/infrastructure/issues/61#issuecomment-1646788248), and this one is going to install bookworm. Also, when flashing the SD Card, I need to be cautious because my 2TB game archive drive now shows up.

scottmuc commented 9 months ago

Before running the full configuration I looked for bullseye specific stuff and found the following:

~/workspace/infrastructure/pi ? git grep bullseye
tasks/logging.yml:      deb http://deb.debian.org/debian bullseye-backports main contrib non-free
tasks/logging.yml:      deb-src http://deb.debian.org/debian bullseye-backports main contrib non-free
tasks/logging.yml:    default_release: bullseye-backports

I'm going to remove it and repair accordingly. I believe that bookworm has the packages I need (based on my notes in https://github.com/scottmuc/infrastructure/issues/60#issuecomment-1615886912)

scottmuc commented 9 months ago

First issue encountered:

TASK [Disable resolved] **********************************************************************************
fatal: [192.168.2.102]: FAILED! => {"changed": false, "msg": "Could not find the requested service systemd
-resolved: host"}

Looking at the release notes, it does appear that some systemd package renaming has been done. The systemd-resolved service doesn't exist, so I'll remove this step from the playbook for now.

scottmuc commented 9 months ago

Second issue encountered:

The certbot was failing to pass the ACME challenge. Turns out that my certificates hadn't expired! My home IP changed and I needed to update my records.

An interesting anecdote of when deferred maintenance along with an unplanned issue make things more complicated to understand the errors that show up. I was biased towards assuming that my certificates expired since that was top of mind, and this made me forget that this external change can also happen.

scottmuc commented 9 months ago

Third issue encountered:

TASK [Install fuse packages] *****************************************************************************
fatal: [192.168.2.102]: FAILED! => {"changed": false, "msg": "No package matching 'exfat-utils' is availab
le"}

It's clear that the package doesn't exist in bookworm: https://packages.debian.org/bullseye/exfat-utils

Perhaps, it's not needed anymore? Going to remove it and see if the run continues.

scottmuc commented 9 months ago

Fourth issue encountered:

TASK [Ensure dhcpcd is stopped] **************************************************************************
fatal: [192.168.2.102]: FAILED! => {"changed": false, "msg": "Could not find the requested service dhcpcd:
 host"}

Looks like there's some network management changes in bookworm. I removed the stopping of dhcpcd and changed the package removal to isc-dhcp-client.

scottmuc commented 9 months ago

The playbook ran to completion, I was able to reboot the PI and it applied the static IP address. Not sure how much is actually working though. I guess I'll find out as I go through the verification steps. With the bookworm upgrade, I think a task-by-task review of the playbook needs to be done.

scottmuc commented 9 months ago

Node exporter for the PI seems to have some issues, but otherwise, all the main features are working!

image

scottmuc commented 9 months ago

Repave version state:

ansible@raspberrypi:~ $ cat /etc/os-release && uname -a
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Linux raspberrypi 6.1.0-rpi4-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.54-1+rpt2 (2023-10-05) aarch64 GNU/Linux

Really tempted to start attaching a package list dpkg -l post configuration.

scottmuc commented 9 months ago

Going to take a break now... get some bolognese going, and looking into all the issues in more depth later.

scottmuc commented 9 months ago

systemd-resolved and dhcpcd analysis

It looks like systemd-resolved is now opt-in. Given that this host is going to be a DNS resolver, I don't see any need to install it. It does appear that I may need to update /etc/resolv.conf myself though:

ansible@raspberrypi:~ $ cat /etc/resolv.conf
# Generated by NetworkManager
search speedport.ip
nameserver 192.168.2.1
nameserver fe80::1%eth0

It looks like I should disable and uninstall NetworkManager (https://wiki.debian.org/NetworkManager).

I don't know the relationship between dhcpcd5 (looks to be eol) and isc-dhcp-client, but I won't worry about it since the isc stuff is deprecated (source). Interesting to note that my previous decision to use kea initially wasn't too bad of a choice, but happy that I went down the dnsmasq route.

It looks like tasks/staticip.yml is going to be a good spot to evolve this configuration going forward. I believe I needed to stop systemd-resolved because it bound to port 22, hence why it was in the tasks/dnsresolver.yml configuration.

Not going to disable/remove NetworkManager. For now, it detects the presence of eth0 being loaded in /etc/network/interfaces and won't manage it:

ansible@raspberrypi:~ $ sudo nmcli dev status
DEVICE         TYPE      STATE                   CONNECTION
lo             loopback  connected (externally)  lo
wlan0          wifi      disconnected            --
p2p-dev-wlan0  wifi-p2p  disconnected            --
eth0           ethernet  unmanaged               --
scottmuc commented 9 months ago

exfat-utils analysis

I'm not sure why exfat-utils was needed in the first place. Looking for some documents, it appears it's been dropped in favor of exfatprogs, which is installed:

ansible@raspberrypi:~ $ dpkg -l | grep exfa
ii  exfat-fuse                           1.3.0+git20220115-2                 arm64        read and write e
xFAT driver for FUSE
ii  exfatprogs                           1.2.0-1                             arm64        exFAT file syste
m utilities

Don't believe there's anything I need to do further with this. The USB drive is working fine.

scottmuc commented 9 months ago

Looks like fixing the scrape config fixed the grafana dashboard (can't be 100% sure though).

With the other fixes in place, it looks like this repave is complete! Not too bad given that it included a complete OS version upgrade as well.