Closed scottmuc closed 9 months ago
This repave became a bit forced because I let the certificates expire! I was sick and had a couple trips. I was thinking about how to record this one and that punted the repave further. For now, I'm just going to do it and think about how to record the next one.
Interesting to note that the previous OS was bullseye
(https://github.com/scottmuc/infrastructure/issues/61#issuecomment-1646788248), and this one is going to install bookworm
. Also, when flashing the SD Card, I need to be cautious because my 2TB game archive drive now shows up.
Before running the full configuration I looked for bullseye
specific stuff and found the following:
~/workspace/infrastructure/pi ? git grep bullseye
tasks/logging.yml: deb http://deb.debian.org/debian bullseye-backports main contrib non-free
tasks/logging.yml: deb-src http://deb.debian.org/debian bullseye-backports main contrib non-free
tasks/logging.yml: default_release: bullseye-backports
I'm going to remove it and repair accordingly. I believe that bookworm
has the packages I need (based on my notes in https://github.com/scottmuc/infrastructure/issues/60#issuecomment-1615886912)
First issue encountered:
TASK [Disable resolved] **********************************************************************************
fatal: [192.168.2.102]: FAILED! => {"changed": false, "msg": "Could not find the requested service systemd
-resolved: host"}
Looking at the release notes, it does appear that some systemd
package renaming has been done. The systemd-resolved
service doesn't exist, so I'll remove this step from the playbook for now.
Second issue encountered:
The certbot
was failing to pass the ACME challenge. Turns out that my certificates hadn't expired! My home IP changed and I needed to update my records.
An interesting anecdote of when deferred maintenance along with an unplanned issue make things more complicated to understand the errors that show up. I was biased towards assuming that my certificates expired since that was top of mind, and this made me forget that this external change can also happen.
Third issue encountered:
TASK [Install fuse packages] *****************************************************************************
fatal: [192.168.2.102]: FAILED! => {"changed": false, "msg": "No package matching 'exfat-utils' is availab
le"}
It's clear that the package doesn't exist in bookworm
: https://packages.debian.org/bullseye/exfat-utils
Perhaps, it's not needed anymore? Going to remove it and see if the run continues.
Fourth issue encountered:
TASK [Ensure dhcpcd is stopped] **************************************************************************
fatal: [192.168.2.102]: FAILED! => {"changed": false, "msg": "Could not find the requested service dhcpcd:
host"}
Looks like there's some network management changes in bookworm
. I removed the stopping of dhcpcd
and changed the package removal to isc-dhcp-client
.
The playbook ran to completion, I was able to reboot the PI and it applied the static IP address. Not sure how much is actually working though. I guess I'll find out as I go through the verification steps. With the bookworm
upgrade, I think a task-by-task review of the playbook needs to be done.
Node exporter for the PI seems to have some issues, but otherwise, all the main features are working!
Repave version state:
ansible@raspberrypi:~ $ cat /etc/os-release && uname -a
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Linux raspberrypi 6.1.0-rpi4-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.54-1+rpt2 (2023-10-05) aarch64 GNU/Linux
Really tempted to start attaching a package list dpkg -l
post configuration.
Going to take a break now... get some bolognese going, and looking into all the issues in more depth later.
It looks like systemd-resolved
is now opt-in. Given that this host is going to be a DNS resolver, I don't see any need to install it. It does appear that I may need to update /etc/resolv.conf
myself though:
ansible@raspberrypi:~ $ cat /etc/resolv.conf
# Generated by NetworkManager
search speedport.ip
nameserver 192.168.2.1
nameserver fe80::1%eth0
It looks like I should disable and uninstall NetworkManager
(https://wiki.debian.org/NetworkManager).
I don't know the relationship between dhcpcd5
(looks to be eol) and isc-dhcp-client
, but I won't worry about it since the isc stuff is deprecated (source). Interesting to note that my previous decision to use kea
initially wasn't too bad of a choice, but happy that I went down the dnsmasq
route.
It looks like tasks/staticip.yml
is going to be a good spot to evolve this configuration going forward. I believe I needed to stop systemd-resolved
because it bound to port 22
, hence why it was in the tasks/dnsresolver.yml
configuration.
/etc/resolv.conf
configurationNetworkManager
~Not going to disable/remove NetworkManager
. For now, it detects the presence of eth0
being loaded in /etc/network/interfaces
and won't manage it:
ansible@raspberrypi:~ $ sudo nmcli dev status
DEVICE TYPE STATE CONNECTION
lo loopback connected (externally) lo
wlan0 wifi disconnected --
p2p-dev-wlan0 wifi-p2p disconnected --
eth0 ethernet unmanaged --
I'm not sure why exfat-utils
was needed in the first place. Looking for some documents, it appears it's been dropped in favor of exfatprogs
, which is installed:
ansible@raspberrypi:~ $ dpkg -l | grep exfa
ii exfat-fuse 1.3.0+git20220115-2 arm64 read and write e
xFAT driver for FUSE
ii exfatprogs 1.2.0-1 arm64 exFAT file syste
m utilities
Don't believe there's anything I need to do further with this. The USB drive is working fine.
Looks like fixing the scrape config fixed the grafana dashboard (can't be 100% sure though).
With the other fixes in place, it looks like this repave is complete! Not too bad given that it included a complete OS version upgrade as well.
Yay for Repaving!
As much as possible is documented inline in this issue template. In case of problems you may find help by viewing all the previous repave issues. Have fun!
Things to do with the existing build
[x] Enable DHCP on the router, remove port mapping and statically assign network to PC
Insert screenshots here ;-)
[x] Shutdown PI
Make sure the USB drive has spun down before doing any work.
sudo shutdown -h now
[x] Create SD card with the latest Raspberry Pi OS
Using the SD card in the now powered down PI.
The new installer has options to enable SSH and create a user.
installer download
note check if the underlying Debian distribution is changing as this might result in some issues in the playbook execution.
The Bullseye 64-bit lite image seems to work for now.
Post OS install steps on desktop
[x] Ensure a working ansible enviroment
This will exercise the
asdf
setup.[x] Turn on the PI and note the IP obtained from the Router
[x] Clean up old host keys
The new instance will have new host keys so to ensure host key warning messages don't distract us from the repaving, run the following:
[x] Transfer local public ssh key to PI
In order to avoid the use of
sshpass
, copy the current sessions public ssh key to to./ssh/authorized_keys
of thepi
user on the PI. This user is only necessary to run the bootstrap playbook (which creates an adminansible
user) and will be subsequently cleaned up.ssh-copy-id pi@<pi ip>
[x] Bootstrap with Ansible
./ansible.sh
and select thebootstrap-playbook.yml
[x] Add the PI port forwarding
Needed for the
certbot
ACME challenge in the next step.[x] Complete full configuration
./ansible.sh
and select themain-playbook.yml
[x] Reboot PI
[x] Return
ansible.sh
and select thelogging
tagThis is because the keep alive script is created in
/tmp
. If this instruction is still relevant for the next couple repaves, either move to a stable location or drop log forwarding to BetterStack.[x] Re-add port mapping to the static IP
[x] Disable DHCP on the router
[x] Deploy goodenoughmoney.com
[x] Create
pi
Samba userRun the following on the PI
sudo smbpasswd -a smbrw
[x] Make this template slightly better
How Do I Know I Am Done?
[x] https://www.goodenoughmoney.com/ displays stuff
[x] https://home.scottmuc.com/music/ loads navidrome and the music is playable
[x] http://prometheus.home.scottmuc.com:9090/ loads and has data
[x] http://grafana.home.scottmuc.com:3000/ loads and has data
[x] Z:\ on my Windows PC works
[x]
ipconfig /release
and thenipconfig /renew
works[x]
nslookup analytics.google.com
is refused[x] Print out newly repaved machine details
cat /etc/os-release && uname -a