linux-automation / meta-lxatac

Build your own LXA TAC images and bundles
MIT License
5 stars 15 forks source link

meta-lxatac-bsp: lxatac-factory-data: reload udev rules on completion #107

Closed hnez closed 7 months ago

hnez commented 7 months ago

The lxatac-factory-data script generates a tac-bridge.link file, which sets the MAC address to use on the tac-bridge network interface, which is a bridge created by NetworkManager.

We have noticed that on some boots this MAC address is not correct. On these boots we can also see that udev (which reads the .link files) did not take the link file into account (even though it exists):

Normal boot:

# udevadm info /sys/class/net/tac-bridge | grep LINK_FILE
ID_NET_LINK_FILE=/run/systemd/network/10-tac-bridge.link

Boot with wrong MAC address:

# udevadm info /sys/class/net/tac-bridge | grep LINK_FILE
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link

This hints at some sort of race condition between lxatac-factory-data, udev reading its rules and NetworkManager setting up the bridge device.

Run udevadm control --reload-rules after creating the link file to hopefully solve the race condition.

I did some "statistical" testing of this change by running our labgrid pytest test in a loop, which checks if the MAC is correct:

# Before this change the interface has the correct MAC
# 14 / (14 + 6) = 70% of the time.
14 before/success
 6 before/failure

# After this change the interface has the correct MAC
# in all 21 runs.
21 udevadm-reload/success
 0 udevadm-reload/failure

# When replacing the `udevadm control --reload-rules` with
# just a `sleep 1` the MAC address is still correct in all 22
# runs, but the IPv6 link local address of the interface contains
# the wrong MAC address.
20 sleep/success
 2 sleep/failure

The tests indicate that the fix works, but are not 100% clear as to it working for the correct reasons (because just waiting also helps a bit).

jluebbe commented 7 months ago

As an alternative, perhaps we could avoid the devadm control --reload-rules by running this before udev? That would likely mean setting DefaultDependencies=no. @michaelolbrich what's your opinion?

michaelolbrich commented 7 months ago

It's not that simple. The same script talks to hostnamed and that's not an early service either.

hnez commented 7 months ago

I think we should just drop the interaction with hostnamed here and instead just write the current hostname into /etc/hostname if /etc/hostname does not yet exist. That allows us to run this script before udev.

Somewhat related: I think we should migrate /etc/hostname in the rauc hook if we don't do that already. That way users get a sensible hostname by default (passed in from barebox) but are also free to set a custom hostname that sticks. That's something we had customers ask about.

SmithChart commented 7 months ago

:+1: for the migration of /etc/hostname

hnez commented 7 months ago

Closed in favor of #115, which works without reload.