openhab / openhabian

openHABian - empowering the smart home, for Raspberry Pi and Debian systems

https://community.openhab.org/t/13379

ISC License

820 stars 252 forks source link

Two IP addresses assigned to wlan0 #1690

Closed Nadahar closed 2 years ago

Nadahar commented 2 years ago

Issue information:

My RPi 4 gets assigned two IPv4 addresses to wlan0 via DHCP. I haven't really done anything to the installation, it's my first time trying openHABian, so the "installation" is pretty much as it was "flashed". The image used is the 32 bit version of openHABian v1.7.3.

While I haven't experienced a problem as a consequence of the double IP assignment, I haven't started installing bindings and "moving" the configuration from my existing openHAB 2.4 installation (on a Windows box), so I have no idea if it will actually pose a problem or not once being "set to work". It doesn't look right anyway, and as I'm planning to reserve a fixed IP to the RPi's MAC address in the DHCP server, I'm worried that it could cause problems when a lease is requested twice with the same MAC address.

As Raspberry Pi OS/openHABian is new to me, and I've mostly been using Fedora when using Linux in the last years, I don't have much overview over what systems are used to configure the network and such. I have managed to find what seems to be the cause of the double IP address lease though, as show below: Both dhclient and dhcpcd seems to be operating, each reserving one IPv4 address. IPv6 is disabled as it's not used on my home network. So, I guess the question is, why are they both making leases?

Debug information:

> less /var/log/syslog

Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: carrier acquired
Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: connected to Access Point `xxxxx'
Jun  4 04:51:27 habbo dhcpcd[382]: DUID xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: IAID 32:52:53:ff
Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: adding address fe80::364:507b:c5ee:6aa0
Jun  4 04:51:27 habbo dhcpcd[382]: ipv6_addaddr1: Permission denied
Jun  4 04:51:27 habbo dhcpcd[382]: received SIGPIPE
Jun  4 04:51:27 habbo wpa_action: WPA_IFACE=wlan0 WPA_ACTION=CONNECTED
Jun  4 04:51:27 habbo wpa_action: WPA_ID=0 WPA_ID_STR= WPA_CTRL_DIR=/var/run/wpa_supplicant
Jun  4 04:51:27 habbo systemd[1]: Starting System Logging Service...
Jun  4 04:51:27 habbo wpa_action: ifup wlan0=default
Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: rebinding lease of 10.72.64.103
Jun  4 04:51:27 habbo dhclient[653]: Internet Systems Consortium DHCP Client 4.4.1
Jun  4 04:51:27 habbo dhclient[653]: Copyright 2004-2018 Internet Systems Consortium.
Jun  4 04:51:27 habbo dhclient[653]: All rights reserved.
Jun  4 04:51:27 habbo dhclient[653]: For info, please visit https://www.isc.org/software/dhcp/
Jun  4 04:51:27 habbo dhclient[653]:
Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: probing address 10.72.64.103/24
Jun  4 04:51:27 habbo dhclient[653]: Listening on LPF/wlan0/dc:a6:32:52:53:ff
Jun  4 04:51:27 habbo dhclient[653]: Sending on   LPF/wlan0/dc:a6:32:52:53:ff
Jun  4 04:51:27 habbo dhclient[653]: Sending on   Socket/fallback
Jun  4 04:51:27 habbo dhclient[653]: DHCPDISCOVER on wlan0 to 255.255.255.255 port 67 interval 8
Jun  4 04:51:27 habbo dhcpcd[382]: wlan0: soliciting an IPv6 router
Jun  4 04:51:27 habbo rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.2102.0]
Jun  4 04:51:27 habbo rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="662" x-info="https://www.rsyslog.com"] start
Jun  4 04:51:27 habbo systemd[1]: Started System Logging Service.
Jun  4 04:51:27 habbo dhclient[653]: DHCPOFFER of 10.72.64.102 from 10.72.64.1
Jun  4 04:51:27 habbo dhclient[653]: DHCPREQUEST for 10.72.64.102 on wlan0 to 255.255.255.255 port 67
Jun  4 04:51:27 habbo dhclient[653]: DHCPACK of 10.72.64.102 from 10.72.64.1
Jun  4 04:51:28 habbo kernel: [   16.107566] Bluetooth: Core ver 2.22
Jun  4 04:51:28 habbo kernel: [   16.107702] NET: Registered PF_BLUETOOTH protocol family
Jun  4 04:51:28 habbo kernel: [   16.107720] Bluetooth: HCI device and connection manager initialized
Jun  4 04:51:28 habbo kernel: [   16.111141] Bluetooth: HCI socket layer initialized
Jun  4 04:51:28 habbo kernel: [   16.111177] Bluetooth: L2CAP socket layer initialized
Jun  4 04:51:28 habbo kernel: [   16.111218] Bluetooth: SCO socket layer initialized
Jun  4 04:51:28 habbo kernel: [   16.129722] Bluetooth: HCI UART driver ver 2.3
Jun  4 04:51:28 habbo kernel: [   16.129751] Bluetooth: HCI UART protocol H4 registered
Jun  4 04:51:28 habbo kernel: [   16.129873] Bluetooth: HCI UART protocol Three-wire (H5) registered
Jun  4 04:51:28 habbo kernel: [   16.131328] Bluetooth: HCI UART protocol Broadcom registered
Jun  4 04:51:28 habbo systemd[1]: Starting Load/Save RF Kill Switch Status...
Jun  4 04:51:28 habbo systemd[1]: Started Configure Bluetooth Modems connected by UART.
Jun  4 04:51:28 habbo systemd[1]: Started Load/Save RF Kill Switch Status.
Jun  4 04:51:28 habbo systemd[1]: Created slice system-bthelper.slice.
Jun  4 04:51:28 habbo systemd[1]: Starting Raspberry Pi bluetooth helper...
Jun  4 04:51:28 habbo bthelper[716]: Raspberry Pi BDADDR already set
Jun  4 04:51:28 habbo systemd[1]: Finished Raspberry Pi bluetooth helper.
Jun  4 04:51:28 habbo systemd[1]: Starting Bluetooth service...
Jun  4 04:51:28 habbo dhclient[653]: bound to 10.72.64.102 -- renewal in 38881 seconds.
...
Jun  4 04:51:29 habbo systemd[1]: Started Hostname Service.
Jun  4 04:51:32 habbo dhcpcd[382]: wlan0: leased 10.72.64.103 for 86400 seconds
Jun  4 04:51:32 habbo dhcpcd[382]: wlan0: adding route to 10.72.64.0/24
Jun  4 04:51:32 habbo dhcpcd[382]: wlan0: adding default route via 10.72.64.1
Jun  4 04:51:32 habbo systemd[1]: Stopping Network Time Synchronization...
Jun  4 04:51:32 habbo systemd[1]: systemd-timesyncd.service: Succeeded.
Jun  4 04:51:32 habbo systemd[1]: Stopped Network Time Synchronization.
Jun  4 04:51:32 habbo systemd[1]: Starting Network Time Synchronization...
Jun  4 04:51:33 habbo systemd[1]: Started Network Time Synchronization.
Jun  4 04:51:33 habbo systemd[1]: Stopping Network Time Synchronization...
Jun  4 04:51:33 habbo systemd[1]: systemd-timesyncd.service: Succeeded.
Jun  4 04:51:33 habbo systemd[1]: Stopped Network Time Synchronization.
Jun  4 04:51:33 habbo systemd[1]: Starting Network Time Synchronization...
Jun  4 04:51:33 habbo systemd[1]: systemd-rfkill.service: Succeeded.
Jun  4 04:51:33 habbo bthelper[874]: Changing power off succeeded
Jun  4 04:51:33 habbo systemd[1]: Started Network Time Synchronization.
Jun  4 04:51:33 habbo dhcpcd[382]: forked to background, child pid 876
Jun  4 04:51:33 habbo systemd[1]: Started DHCP Client Daemon.
Jun  4 04:51:33 habbo systemd[1]: Reached target Network is Online.
Jun  4 04:51:33 habbo systemd[1]: Starting Samba NMB Daemon...
Jun  4 04:51:33 habbo systemd[1]: Started openHAB - empowering the smart home.

System information:

32 bit openHABian 1.7.3 running from the onboard microSD card slot on a Raspberry Pi 4 4GB. No hardware (HATs, USB dongles etc.) attached to the RPi yet.

> cat /etc/os-release

PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)"
NAME="Raspbian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

> uname -m

armv7l

mstormi commented 2 years ago

openHABian isn't really doing anything to DHCP IP assignment so what you see is pretty much what you get from stock Raspberry Pi OS. As you say you don't have any problem I am not seeing what we as openHABian should be doing about it. To get your question answered I suggest you check out some Raspberry Pi OS forum. Please let us know the outcome here, too.

Nadahar commented 2 years ago

I'm not to keen on contacting Raspberry Pi OS just to have them tell me to contact either you or Debian. So, I've tried to dig deeper to figure out whom to address. It hasn't really made me any wiser.

First of all, I'm in way over my head, I don't think I've touched Debian (including Ubuntu and other derivatives) since 2012-13. Their dependency management really disappointed, and it seems like not much have improved since then. What I'm trying to say is that I'm in no way sure that my findings are correct, but this is what I think is happening.

The "source" of the problem, arguably, seems to be ifupdown, which is used implicitly by Debian to process the interfaces configuration. Look at the implementation of the dhcp method here: https://salsa.debian.org/debian/ifupdown/-/blob/4352ab3b8bafc0a73e2aed1f697d01cab29be4a6/inet.defn#L78-L109

The description even states specifically that:

This method may be used to obtain an address via DHCP with any of the tools: dhclient, udhcpc, dhcpcd (They have been listed in their order of precedence.).

I'm not sure what "language" is used in the file, but it seems to me like the use of the different DHCP clients is hardcoded, and that it will use whichever client it finds in the listed order. This means that if dhclient is installed, ifupdown will use it, not dhcpcd even if it is present too.

Since Debian uses dhclientas the "standard" DHCP client, I guess one could argue that this makes some sense, but it isn't very flexible to say the least.

This seems to me to explain why dhclient is started - the last line in /etc/network/interfaces which reads iface default inet dhcp will fire up dhclient if it exists in /sbin.

dhcpcd is started by Systemd because it is installed and "enabled":

● dhcpcd.service - DHCP Client Daemon
     Loaded: loaded (/lib/systemd/system/dhcpcd.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/dhcpcd.service.d
             └─wait.conf
     Active: active (running) since Sun 2022-06-05 16:51:15 CEST; 8h ago
       Docs: man:dhcpcd(8)
   Main PID: 911 (dhcpcd)
      Tasks: 1 (limit: 4915)
        CPU: 826ms
     CGroup: /system.slice/dhcpcd.service
             └─911 /usr/sbin/dhcpcd -w -q

The question is then, why are they both there - and who's "fault" is it? As I understand it, Raspberry Pi OS has "replaced" dhclient with dhcpcd for some reason I'm not sure of, with the argument that it works better with the RPi presumably. They have probably made sure that it is started by SystemD, and most likely also makes sure that dhclient isn't installed by default.

I'd still argue that this is a pretty "brittle" construct, as quite a few other packages will install dhclient and wreck this whole setup. On the other hand, with Debian's hardcoding of the dhclient preference, I'm not sure what would be the best solution. Using interfaces might have to be avoided completely, or a custom build of ifupdown where the hardcoded preference is changed be made.

Raspberry Pi OS will probably still claim that this isn't "their problem", since they make sure not to include dhclient (and as we all know with Linux, "everything" is the users' responsibility to handle if they dare change something).

So, I've tried to figure out why dhclient is installed. I certainly didn't install it. There's no package called "dhclient", but I manage to figure out that I think it's in the isc-dhcp-client package, which is installed:

> apt-file search sbin/dhclient
isc-dhcp-client: /sbin/dhclient
isc-dhcp-client: /sbin/dhclient-script
isc-dhcp-client-ddns: /sbin/dhclient

When trying to source the origin of isc-dhcp-client using both /var/lib/apt/lists/raspbian.raspberrypi.org_raspbian_dists_bullseye_main_binary-armhf_Packages and apt-cache rdepends --installed <package>, I end up with the following: Nothing depends on isc-dhcp-client, but several packages "recommend" it. Both avahi-autoipd and ifupdown are installed and "recommend" isc-dhcp-client. Nothing depends on ifupdown, but since it's an "essential" part of Debian I assume that it's "preinstalled" and most likely hasn't triggered the installation of isc-dhcp-client.

Following the trail with avahi-autoipd leads to avahi-daemon and libnss-mdns, which ends up being the same thing since the only installed package that depends on avahi-daemon is libnss-mdns. The only installed package that depends on libnss-mdns is openjdk-11-jre-headless.

apt-config dump reveals that:

APT::Install-Recommends "1";
APT::Install-Suggests "0";

For a long time I couldn't understand this, since I didn't think "recommended" packages was installed. But, it turns out they are - by Debian default from what I can understand.

To sum it up, it seems like

openjdk-11-jre-headless -> libnss-mdns -> avahi-autoipd -> isc-dhcp-client

This means that dhclient is installed and thus preferred by ifupdown, which breaks the DHCP client setup. This feels a lot like a "circular firing squad". Debian will say it's not their problem because it works fine as long as you just install one DHCP client. Raspberry Pi OS will say it's not their problem because it works fine as long as you don't install dhclient. You (openHABian) will say it's not your problem because Java is necessary to run openHAB and they fault is really upstream.

I don't really know who to "blame", but I don't think I'm the only one experiencing this problem. It seems to me like this should be a very common situation. I can't experiment since as long as I don't have a microHDMI adapter I can't risk doing something that makes DHCP stop working, as I would be unable to contact the RPi again without reflashing the SD card. When I get it some time in the future, I can try to remove isc-dhcp-client and see if it corrects the behavior. It still wouldn't solve the problem though, because it would be hard to prevent it from being installed again. I don't know if ATP installs "recommended" packages during "upgrade", but I wouldn't be surprised. If so, it probably wouldn't be long until it was back.

It's really "unfortunate" that I, with so little knowledge of these systems, should be the one to figure out what would be a proper solution. For every step outlined above, I've had to search and read. It takes a lot of time, and anybody that knows this a bit better would figure things out much quicker.

To sum it all up, I don't have a solution, but I think I might have found the cause.

Nadahar commented 2 years ago

While checking my own logic, I found that this "chain" shown in my previous post is false:

openjdk-11-jre-headless -> libnss-mdns -> avahi-autoipd -> isc-dhcp-client

The reason is that there are multiple "suggests" in this chain, which won't be automatically installed. In fact, openjdk-11-jre-headless only suggests libnss-mdns which again only suggests avahi-autoipd (but, it depends on avahi-daemon, which also merely suggests avahi-autoipd. avahi-autoipd recommends isc-dhcp-client though, so at least the last "link in the chain" is true.

This led me to look further for the cause of avahi-autoipd being installed, and I think I've found the culprit: https://github.com/openhab/openhabian/blob/5a5110ef01fadbfa2cd252c8b1f13e70cf9fe9e6/functions/system.bash#L55-L64

I don't know why you have chosen to add it, but it's in fc1943afa22208a40e0c372d6271989f9af24adb from #434, which again comes from #433. I guess AutoIP (169.254.0.0/16) can be useful in some extremely rare situation when the network is completely ad-hoc, but the vast majority will have a router in some shape or form that give them Internet access, which makes AutoIP useless. With reference to the situation in #433, installing avahi-autoipd in Ubuntu/Mint isn't an issue, since they don't rely on dhcpcd. This isn't the case for Raspberry Pi OS though, making this a much more "troublesome" choice.

As far as I can understand, this is also the cause of #1456, which means that 99a8be1770bc95d4925ad44cd516531ba25ce851 is kind of pointless. 99a8be1770bc95d4925ad44cd516531ba25ce851 disables the very functionality (AutoIP) that avahi-autoipd adds - but the problem that dhclient is installed still remains.

Nadahar commented 2 years ago

I took the chance and uninstalled avahi-autoipd and isc-dhcp-client and rebooted. It was some tense seconds, but it came back up - this time with only one IP address.

Again, I have a really hard time to think this only is a problem with my installation, it's very easy to check:

ip addr show

The command will list the network interfaces with the assigned IP addresses below. There should be no more than one type of IP address (IPv4 and IPv6) for each interface when using DHCP.

Larsen-Locke commented 2 years ago

I just tried and removed avahi-autoipd and isc-dhcp-client, too.

After the next reboot the raspi didn't connect to wlan. I could still connect with ethernet and reinstall isc-dhcp-client .

After that it connects again with wlan.

Nadahar commented 2 years ago

That's strange - have you checked if you have both DHCP clients running (dhclient and dhcpcd)? I guess you can do it just with ps - I looked in the "system" log to watch what happened during boot though.

edit: I'm doing the "installation" of my new image now - my Internet connection isn't the fastest, so all the downloading takes a while - but it is connected on WLAN as it should and I can watch the progress in the browser.

edit2: I think this should be an easy way to check which DHCP clients are running: ps -ax | grep -E 'dhcp|dhclient'

Nadahar commented 2 years ago

My test image is done installing. It didn't work as intended though, avahi-autoipd isn't installed now, but isc-dhcp-client still is - so something else must have installed it.

Larsen-Locke commented 2 years ago

Now with ps I only see dhclient but I didn't check before the uninstall.

Am 6. Juni 2022 19:36:30 schrieb Nadar @.***>:

That's strange - have you checked if you have both DHCP clients running (dhclient and dhcpcd)? I guess you can do it just with ps - I looked in the "system" log to watch what happened during boot though.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

Nadahar commented 2 years ago

@Larsen-Locke That makes sense - the question then is if you have manually disabled dhcpcd at some point in the past.

Try running systemctl status dhcpcd.service to check the status of the dhcpcd service. If it's not "enabled", run: systemctl enable dhcpcd.service (you might need to prefix it with sudo)

This is how openHABian is configured by default, which causes both to run at the same time. Once it's running, removing isc-dhcp-client should work.

mstormi commented 2 years ago

That's strange - have you checked if you have both DHCP clients running (dhclient and dhcpcd)?

I've actually never seen dhclient run om recent times, and I've vuilt and flashed many many images. My standard test box has an Ethernet connected, maybe that's why.

mstormi commented 2 years ago

This is how openHABian is configured by default, which causes both to run at the same time.

I have not cross-checked but believe that with your PR (i.e. to not install avahi-autoipd) openHABian should be just like stock Raspi OS. So it's possibly already like that in there ?

Nadahar commented 2 years ago

My standard test box has an Ethernet connected, maybe that's why.

I won't claim to understand the logic of /etc/network/interfaces completely, but it could be. This is how it looks after a "fresh openHABian install":

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source /etc/network/interfaces.d/*

allow-hotplug wlan0
iface wlan0 inet manual
wpa-roam /etc/wpa_supplicant/wpa_supplicant.conf
iface default inet dhcp

I'm pretty confident that what "launches" dhclient is the last line, with the magic "dhcp" command. The whole logic with "default" eludes me though - and so does the WPA stuff. I would assume that WPA would never be initialized when using ethernet, so maybe that means that he next line won't either? To really know how this file is parsed I fear I'd have to dive deep into the Debian code.

What I have found, is that the default for Raspberry Pi OS is quite different. Their image comes with an interfaces file that looks like this:

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source /etc/network/interfaces.d/*

...and an interfaces.new that looks like this:

# interfaces(5) file used by ifup(8) and ifdown(8)

# Please note that this file is written to be used with dhcpcd
# For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf'

# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

I'm starting to suspect that isc-dhcp-client is a "standard" Debian package that will always be there unless you explicitly uninstall it. So, it seems to me like the Raspberri Pi OS prevents dhclient from running simply by making sure that dhcp doesn't exist in the interaces file.

Nadahar commented 2 years ago

I have not cross-checked but believe that with your PR (i.e. to not install avahi-autoipd) openHABian should be just like stock Raspi OS. So it's possibly already like that in there ?

Yes, as far as I can tell, avahi-autoipd isn't in their image and removing it should as such not pose a problem. I think the PR is "safe" in that sense, I am already testing it. But, isc-dhcp-client is still there (and the double IP address) despite the PR, so my assumption that avahi-autoipd caused isc-dhcp-client to be installed seems to be false. I've found vague references online that it's "always there" on Debian, so I now suspect that it's there from the very start.

Nadahar commented 2 years ago

After some acrobatics, I managed to connect my RPi to ethernet - and it didn't make any change for me. Now it assigns three addresses - one for eth0 and two for wlan0.

edit: After checking the startup log, IP's are assigned in this order:

dhcpcd leases one IP for eth0
dhclient leases one IP for wlan0
dhcpcd leases a second IP for wlan0

Larsen-Locke commented 2 years ago

I checked the status of dhcpd service and it was not enabled. I enabled it and checked the status:

Warning: The unit file, source configuration file or drop-ins of dhcpcd.service

● dhcpcd.service - dhcpcd on all interfaces

Loaded: loaded (/lib/systemd/system/dhcpcd.service; enabled; vendor preset: e

Drop-In: /etc/systemd/system/dhcpcd.service.d

       └─wait.conf

Active: failed (Result: exit-code) since Mon 2022-06-06 22:59:53 CEST; 54s ag

Process: 365 ExecStart=/usr/lib/dhcpcd5/dhcpcd -q -w (code=exited, status=6)

Jun 06 22:59:52 rapi2 systemd[1]: Starting dhcpcd on all interfaces...

Jun 06 22:59:52 rapi2 dhcpcd[365]: Not running dhcpcd because /etc/network/inter

Jun 06 22:59:52 rapi2 dhcpcd[365]: defines some interfaces that will use a

Jun 06 22:59:52 rapi2 dhcpcd[365]: DHCP client or static address

Jun 06 22:59:53 rapi2 systemd[1]: dhcpcd.service: Control process exited, code=e

Jun 06 22:59:53 rapi2 systemd[1]: dhcpcd.service: Failed with result 'exit-code'

Jun 06 22:59:53 rapi2 systemd[1]: Failed to start dhcpcd on all interfaces.`

Wlan was installed by openhabian-config.

/etc/network/interfaces: `# interfaces(5) file used by ifup(8) and ifdown(8)

Please note that this file is written to be used with dhcpcd

For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf'

Include files from /etc/network/interfaces.d:

source-directory /etc/network/interfaces.d

allow-hotplug wlan0

iface wlan0 inet manual

wpa-roam /etc/wpa_supplicant/wpa_supplicant.conf

iface default inet dhcp`

Nadahar commented 2 years ago

@Larsen-Locke I find this message interesting:

systemd[1]: Starting dhcpcd on all interfaces...
dhcpcd[365]: Not running dhcpcd because /etc/network/inter
dhcpcd[365]: defines some interfaces that will use a
dhcpcd[365]: DHCP client or static address
systemd[1]: dhcpcd.service: Control process exited, code=e
systemd[1]: dhcpcd.service: Failed with result 'exit-code'
systemd[1]: Failed to start dhcpcd on all interfaces.

The fact that dchpcd refuses to start seems to be an attempt at a "protection" against this very problem. I found this question which shows the same behavior. I've been wondering why this "protection" doesn't work on my installation, but I noticed that the question was about an older installation Raspbian GNU/Linux 9 (stretch).

What version are you running (cat /etc/os-release)?

Larsen-Locke commented 2 years ago

VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=raspbian

Nadahar commented 2 years ago

I'm running 11/bullseye, maybe something has changed here that has made the "protection" in dhcpcd fail..

Nadahar commented 2 years ago

It now seems reasonably clear to me that isc-dhcp-client installed as a "part of Debian itself". It has a priority of "important", which is the second highest priority in the "Debian policy" and as such will always be installed even by the "minimal" installer. This list of minimal packages also supports this.

So, it seems that the idea of relying on dhclient not being there is futile. It is of course possible to uninstall it, but I would guess that since it's "assumed to be there by default" all kind of strange things could happen. When you combine this with the hard-coded preference it gets from ifupdown makes it very hard to actually use interfaces on a Debian based system and an alternative DHCP client.

I guess this explains why Raspi OS/Raspbian has chosen not to use interfaces for network configuration. It also questions the decision of actually using it in openHABian.

The things I still need to figure out is why the "protection" that disabled dhcpcd doesn't always "work", and why we need dhcpcd at all (why was it chosen for Raspian?). Is this all related to the "automatic hotspot" fallback functionality?

Disabling dhcpcd is easy, but I would assume that it's there for a reason.

Nadahar commented 2 years ago

This also seems to confirm the above assumption: https://serverfault.com/questions/1065565/how-to-run-dhcpcd-on-interface-eth1-only/1065571#1065571

This really smells a lot like a "war of philosophy" between different "factions" to me, I really hope it's not and that there's an actual good reason for this situation.

Nadahar commented 2 years ago

It seems like dhcpcd might have a very uncertain future: http://roy.marples.name/archives/dhcpcd-discuss/0003457.html

Nadahar commented 2 years ago

I've been trying to find some reasoning for using dhcpcd, but to no avail. I'm sure there must be some reason, but the first GitHub commit for raspberrypi-net-mods was done after the "switch": https://github.com/RPi-Distro/raspberrypi-net-mods/commit/bb0c51beacddb433a348f365556f7c3f348a3b41. According to this forum post no version control (git, svn etc) exists before this, so I don't know if the reason for this can be found in public. It might lurk in a forum somewhere, but it feels a lot like looking for a needle in a haystack.

Nadahar commented 2 years ago

I've made some progress, although I've not come to the bottom of this.

I have identified the "protection" code that prevents dhcpcd from starting under some circumstances, which prevents the double IP issue. It's in the package dhcpcd5, in the "init script" found here: https://sources.debian.org/src/dhcpcd5/7.1.0-2/debian/dhcpcd5.dhcpcd.init/#L45-L51

INTERFACES=/etc/network/interfaces

if grep -q "^[[:space:]]*iface[[:space:]]*.*[[:space:]]*inet[[:space:]]*dhcp" \
  $INTERFACES; then
    log_failure_msg "Not running $NAME because $INTERFACES"
    log_failure_msg "defines some interfaces that will use a" \
      "DHCP client"
  exit 6
fi

This is very crude and unsophisticated, as you can see it does a simple regex check in /etc/network/interfaces for if dhcp is used after iface and inet. It fails to check files in /etc/network/interfaces.d/...

It seems to me like the above code is all that prevents the "double IP" bug from happening to everybody. What I do not (yet hopefully) understand is why this doesn't "protect" my installation.

To try to figure this out, I've now installed openHABian 1.6.6. In addition I did not disable IPv5 in /boot/openhabian.conf before first boot, like I've done with previous installations. On this installation, this "protection" works also for me - dhcpcd is prevented from running and I only get one IP address on wlan0. The question now is which of the two changes I made (openHABian version or disable IPv6) that made the difference.

1.6.6 runs buster, not bullseye, and the version of dhcpcd5 is 1:8.1.2-1+rpt1. On the latest version, the version of dhcpc5 is 1:8.1.2-1+rpt5. It's not clear to me if this is what makes the difference though, since the "script" is also there in 8.1.2 - it just seems that it's never run for some reason.

I guess the next step now is to install the latest version without disabling IPv6. It's not that pinpointing exactly what change triggered this "solves" the problem, the whole setup is quite "brittle" as it is. If the situation is like it looks at the moment, it seems to me that what makes openHABian "behave correctly" is that dhcpcd is prevented from starting. If so, the easy solution would be to just disable the service, and this would all be handled "the standard Debian way" using dhclient. Still, I'd like to know exactly why this happens, to better understand what affects what.

mstormi commented 2 years ago

I guess the next step now is to install the latest version without disabling IPv6

Which is what I always do. Note I do NOT see dhclient but dhcpcd IS started. (then again note my box is on ethernet)

Nadahar commented 2 years ago

Which is what I always do. Note I do NOT see dhclient but dhcpcd IS started. (then again note my box is on ethernet)

If that's the case, then this is even stranger. The way I now only have one IP with 1.6.6, and @Larsen-Locke from what I understand, is because dhcpcd is prevented from running (so that dhclient does the job). Your installations on the other hand "works" because dhcpcd is running and dhclient is not...

Maybe I have to try to do an installation with Ethernet connected as well, just to compare. That said, since you have Ethernet connected, that probably means that you don't fill in SSID/PSK for the Wifi? That could potentially change things I guess.

Nadahar commented 2 years ago

I reinstalled 1.7.3 without disabling IPv6. It's the same, both dhclient and dhcpcd running each leasing one IPv4 address for wlan0. So, it seems like it is caused by something that has changed between the versions - I assume between Buster and Bullseye.

mstormi commented 2 years ago

that probably means that you don't fill in SSID/PSK for the Wifi?

Yes I don't. And I'm only using testing with latest openHABian. Speaking generally, openHABian users should not use WiFi for reliability reasons if they can avoid it. Which is probably what most do and why there's noone to have hit your issue before (or noone that cares as much as you do - much appreciated). But users should also not run multihomed either, no matter if 2xWiFi like your case, but also not Eth+WiFi. You ultimately should be having a single IP address only (well plus localhost).

I reinstalled 1.7.3

Please for future experiments only work based on latest openHABian, main branch so any new users can benefit from this, too. There have been too many unknown changes in the meantime such as the buster->bullseye move. You should be upgrading (or better: reinstall) your production system, too.

Nadahar commented 2 years ago

My primary reason for moving to a RPi is to make it robust. I was planning to run it on a Linux VM on a server originally, but it's too much hassle each time there's a thunderstorm or to power is out. I've equipped it with a PiJuice HAT so that I can run for many hours without being connected to power. Normally I "hate" WiFi, but in this case I want to use WiFi exactly so that it won't be connected via a Ethernet cable and risk being damaged by lightning. Whenever a lightning storm comes close, I just unplug the power and it can keep on doing its thing. Ideally I wouldn't want it to use DHCP, because a static configuration is "safer" in that it doesn't require contact with a DHCP server during boot. But, since you recommended against using static IP I was thinking of using DHCP, although I'm not quite convinced that I want to do that yet.

I'm not just doing this to "solve my problem", it would be much easier for me to just stop dhcpcd and configure it statically. I'm doing it because I'm convinced that there's a bug here, and I'm trying to get to the bottom of it before moving on. I haven't gotten all my parts yet, including the "endurance" SD card I'm going to use in "production", so my "production system" doesn't exist yet.

I've only done one experiment with 1.6.6, and that was because I was out of ideas. I think that was useful, because it showed that the dual IP problem isn't there with 1.6.6/buster. I'm still not sure what exactly has lead to the change, but I suspect that it's at the Debian or perhaps Raspi OS level. Except for that, I've been using 1.7.3 just without avahi-autoipd installed. I really don't think avahi-autoipd has anything to do with this anymore - the reason I thought so in the beginning was that I though installing that was what implicitly installed isc-dhcp-client.

My curiosity wants me to pinpoint the exact change that has triggered the change in behavior, but pragmatically speaking it might not matter that much. The fact is still, as I now understand it, is that one should not both run dhcpcd and use the dhcp keyword in /etc/network/interfaces on the same installation. I think that's where the "real" solution to this whole issue lies.

I'm not sure I know enough about all the circumstances this is meant to solve, I mean with "failover" from Ethernet to WiFi or vice versa, plus the WiFi hotspot function. That makes it hard to suggest a different configuration that still takes it all into account. Testing failover etc. would also be immensely easier once I have the microHDMI adapter if I lose network access.

mstormi commented 2 years ago

Your work is very much appreciated, and be your motives just 'egoistic', i.e. curiosity and the strong willingness to understand things. That being said, for your production setup, I'd still recommend to use Ethernet only. If you're really so afraid of lightning to hit, you can also get a cheapish external UPS that also can protect the Ethernet cable from overvoltage. Given a RPi is less than 50 bucks though, consider just taking the risk instead. That's a real advantage of these neat boxes many people don't see: they're cheap to replace so no (or less) need to invest in 'box' reliability such as dual power supplies, venting, battery backup etc. like you would do with 'big iron' in data centers. Just have a spare on site and you're prepared.

I'd not aim for a multihomed system with Eth-WiFi failover. I'm sure there will be issues with that way beyond interface configuration, such as services to bind to and be addressed on only one of the interfaces, so ultimately even if you properly worked out how to setup this (which would be great!), I wouldn't think it will ultimately do what you want it to i.e. improve resilience. Also consider that any lightning-class disaster will likely affect more of yoiur hardware than just the RPi such as your WiFi AP, router and electrical infrastructure. Consider having a secondary SD that's configured to run on WiFi only. That you can use in case your Ethernet is unplugged or broken. If you assign the IP via DHCP based on MAC that'll effectively get you the same IP no matter if you boot with the standard 'Ethernet' SD or the WiFi one.

Nadahar commented 2 years ago

I haven't done anymore testing, but I'd just like to address a couple of points.

I'm not "afraid" of lightning, we've had to replace two fluorescent light fixtures that has been "fried" by lightning in the last three years or so. We routinely unplug all equipment that we deem "at risk", and still things break like permanently connected roof lights. We live close to the top of a hill, so we might be extra exposed, or the electric lines in the area might be extra vulnerable. I don't know, I just know that this is a very real threat, and that when I've removed the lights, their plastic has been so brittle that they have more or less just fallen apart, and they have smelled quite bad. No electronic equipment is going to survive it, I'm pretty sure of that. Thunderstorms are very common here from July to the end of August or so, so it's something we have to "live with". In addition it's quite common with power outages from storms from November to January from fallen trees.

I also have other equipment that I have to shut down and start up on both sides of such "events", and if often takes me almost an hour after the power is back on or the thunderstorm is over for me to get everything back up and running again. My whole incentive for moving to the RPi was to eliminate one of these things, I don't want to have to shut down and restart it. I'll have to disconnect the power supply during thunder storms, but that's all I want it to involve. Most of the things being controlled by openHAB are z-wave devices, so it doesn't really matter so much if I lose network connectivity to the RPi during an "event". The most important thing for me is that openHAB can keep running, keep communicating with the devices and be ready to go when it's all over, without all the reinitialization and general confusion that exists after a shutdown.

I'm considering a cheap UPS for the "main switch" and the WiFi antenna, but that's just for the luxury to be able to connect from laptops and mobile devices during an "event". But, having Ethernet connected is a risk in itself, it's enough that you've forgotten to disconnect the power to just one of the other wired devices and you can risk frying everything else. I'm not saying that is likely, but it's possible.

Since the network connectivity to the RPi isn't my "primary concern", I think relying on WiFi is quite acceptable. I do of course want a setup where I can plug in an Ethernet cable and get connected while its running should the need arise. To make that work, it would be preferable to have eth0 configured with at static IP, so that it is configured despite not reaching a DHCP server at boot time. As I see it, it's vital to be able to connect to order a shutdown to avoid corrupting the file system. Especially with the heavy memory cache use openHABian is configured for. Luckily the PiJuice can initiate halt when the battery reaches a configurable level or when one of its buttons are held for a configurable period of time. This means the risk of a "dirty shutdown" should be minimal.

I'm in the middle of "building" a PWM fan solution with real fan speed control for it now too - quite opposite your "disposable" philosophy. There are two reasons why I see it differently: 1) Money isn't the only concern, all the hassle, the downtime, the ordering of new parts, having to remember all the details I had already forgotten etc. is also a "cost". 2) They aren't really cheap at this time at least, because of the general "situation" in the world. I bought my 4B 4GB for around €90 second hand. If I wanted to buy one new here in Norway, I'd have to either order it from abroad pay insane import fees and tax (only fees and taxes is more than €50) or I'd have to pay more than €100 and wait until October or November to receive it (according to their "estimates"). So, for that to be any kind of "safety", I'd really have to buy two from the beginning, so that I already had a spare one. I'd still have to replace it at a time that wouldn't necessarily be very convenient. If it were to die, I think it would be easier for me to just "transfer" the openHAB configuration to a server running a VM while I wait for a new one. If they become readily available and cheap again in the future, I might see this differently though.

Regarding the failover I mentioned I think you misunderstood me somewhat. I wasn't thinking of a failover that would actually let openHAB continue to play ball nicely. I was rather thinking of something that makes sure you can still connect via SSH to initiate a graceful shutdown or whatever the need would be. I agree that making everything work with multiple IP addresses isn't very realistic, it's the same reason why I'm pretty sure that the current "double IP assignment" issue will pose problems. When making software binding to sockets, there's no "good" solution for handling this. Sometimes the circumstances will allow you to bind to any/0.0.0.0, but all too often that's not possible for one reason or another. Binding to multiple addresses then usually means running multiple threads listening on multiple sockets and then coordinate the resulting mess. Guessing the "correct" IP to use if you have to just pick one is also very difficult, I have one such algorithm that I've had to revisit time and time again because it's simply "impossible" to make it anything close to optimal. So, I expect software generally not to handle this very elegantly, the most sensible option is probably to make it a configuration option and then just make a "wild guess" if nothing is configured. That still makes multiple addresses on DHCP a challenge, you can't configure a dynamic address in a configuration file (or at least you shouldn't), and the logic for "guessing" will vary. Then there's handling that this changes while the program is running - that won't happen unless the software is explicitly written to handle such events. To sum it up, I was never thinking having about having openHAB keep working during a failover event, I just wanted to make sure SSH was still available.

I'm not sure I see the benefit of having two SDs with different configurations, it would mean that all the z-wave stuff, persistence, states etc would be out of whack anyway. It would essentially achieve nothing more than copying the openHAB configuration to a new installation on a computer/VM would achieve.

Nadahar commented 2 years ago

I have a small update. I've finally had the time to do another "fresh install" of the latest openHABian version 1.7.3, without switching to main branch or doing any other kind of customization. The only thing I modified in /boot/openhabian.conf before installation was hostname (and that really shouldn't impact anything). I did not configure WiFi.

As expected, it started up with just wired network (eth0), a single IPv4 address and with only dhcpcd running. But, as soon as I start openhabian-config, configure WiFi and reboot I have two wlan0 IPv4 addresses with both dhclient and dhcpcd running. eth0 still only have a single IPv4 Disabling WiFi again makes it return to the initial state where I only have a single IPv4 address.

I strongly suspect that this issue exists on all Bullseye based openHABian installation with WiFi configured.

Nadahar commented 2 years ago

I don't think this is directly related to the double IP issue, but there is an issue with the script that "enables WiFi". I've seen it several times now, it will give this error "seemingly out of the blue" while it will work at other times:

                    ┌──────────────────────────────────────────────────────────────────────────────┐
                    │                                                                              │
                    │ There was an error or interruption during the execution of:                  │
                    │   "30 | System Settings"                                                     │
                    │                                                                              │
                    │ Please try again. If the error persists, please read                         │
                    │ /opt/openhabian/docs/openhabian-DEBUG.md or                                  │
                    │ https://github.com/openhab/openhabian/blob/main/docs/openhabian-DEBUG.md how │
                    │ to proceed.                                                                  │
                    │                                                                              │
                    │                                                                              │
                    │                                    <Ok>                                      │
                    │                                                                              │
                    └──────────────────────────────────────────────────────────────────────────────┘

mstormi commented 2 years ago

set debugmode=maximum in openhabian.conf to see more

Nadahar commented 2 years ago

set debugmode=maximum in openhabian.conf to see more

I have found the reason, see #1693

Nadahar commented 2 years ago

As far as I can understand at this point, the cause of the double IP issue is fundamentally that /etc/network/interfaces is populated with WiFi configuration by wifi.bash. This is what triggers dhclient - which is what is "wrong" given Raspbian/RaspiOS's decision to use dhcpcd instead.

It further looks like something has changed in Debian between Buster and Bullseye (I haven't managed to pinpoint exactly what, but I don't think that is very important) that have somehow "disabled" the "protection" in dhcpcd5 where it refuses to start if /etc/network/interfaces is configured with a DHCP configuration. This was never a proper solution anyway, but it masked the underlying problem in this case - the fact that both dhclient and dhcpcd are configured to serve as DHCP clients.

It's clear that the "offending configuration" is created by wifi.bash. The reason why this was done isn't so easy to deduce. Trying to trace where this comes from, I've come up with 786698bb2afacb1535e58ce7dfcbec5ba4383f1e as the source: https://github.com/openhab/openhabian/blob/786698bb2afacb1535e58ce7dfcbec5ba4383f1e/openhabian-setup.sh#L374-L378

This then goes on a voyage via 9b5b4982101dc60285021a68621ae2dd5273622d, b0beb7ff0fee9000005823c7b6317a06fa7e646e and c0d8f8ba943c228c071563b7887946fd48532261 before it lands where it is today: https://github.com/openhab/openhabian/blob/5a5110ef01fadbfa2cd252c8b1f13e70cf9fe9e6/functions/wifi.bash#L95-L99

The exact purpose of this code is still unclear to me. I'm not sure when the switch to dhcpcd took place in Raspberry/RaspiOS, but maybe this code was written before the switch and simply never removed?

My WiFi seems to work just fine without it - with the result that only dhcpcd is leasing an address.