raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.17k stars 5.01k forks source link

Raspberry Pi 4: DHCPCD route socket overflowed #4092

Open jwillmer opened 3 years ago

jwillmer commented 3 years ago

Describe the bug Every now and then my Pi is loosing it's IPv6. I found out that I can fix the issue temporarily via systemctl restart dhcpcd. Today it happened again and I used systemctl status dhcpcd to look at the state. I got the following output:

Warning: The unit file, source configuration file or drop-ins of dhcpcd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● dhcpcd.service - dhcpcd on all interfaces
   Loaded: loaded (/lib/systemd/system/dhcpcd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/dhcpcd.service.d
           └─wait.conf
   Active: failed (Result: signal) since Fri 2021-01-22 15:36:35 CET; 1 day 6h ago
  Process: 340 ExecStart=/usr/lib/dhcpcd5/dhcpcd -q -w (code=exited, status=0/SUCCESS)
 Main PID: 484 (code=killed, signal=SEGV)

Jan 22 15:36:35 home-server dhcpcd[484]: veth2c6abe3: waiting for carrier
Jan 22 15:36:35 home-server dhcpcd[484]: vethf713b46: IAID 69:de:ae:f1
Jan 22 15:36:35 home-server dhcpcd[484]: vethf713b46: adding address fe80::e7........7:52f9
Jan 22 15:36:35 home-server dhcpcd[484]: veth88ef6b4: waiting for carrier
Jan 22 15:36:35 home-server dhcpcd[484]: veth514e931: waiting for carrier
Jan 22 15:36:35 home-server dhcpcd[484]: route socket overflowed - learning interface state
Jan 22 15:36:35 home-server dhcpcd[484]: vethf32a0ec: carrier acquired
Jan 22 15:36:35 home-server dhcpcd[484]: vethf32a0ec: IAID bf:44:26:9c
Jan 22 15:36:35 home-server systemd[1]: dhcpcd.service: Main process exited, code=killed, status=11/SEGV
Jan 22 15:36:35 home-server systemd[1]: dhcpcd.service: Failed with result 'signal'.

I don't have enough knowledge about Linux to say that this is the right channel for this issue. Please be kind and redirect me if this issue is completely off topic.

To reproduce I don't know. I can't find a pattern, it just happens now and then.

System

Additional context I am only running docker containers on the Pi. I boot the OS from an SSD drive but this I did only recently and I had the issue before as well.

alaub81 commented 3 years ago

I have a similar problem. It seems to be, that dhcpcd in the raspberry pi OS Version, which is:

dhcpcd 8.1.2
Copyright (c) 2006-2019 Roy Marples
Compiled in features: INET ARP ARPing IPv4LL INET6 DHCPv6 AUTH

Fails if there are to many network interfaces. It Seems like you also have docker running on the pi.

On my side, the dhcpcd and dhcpcd5 daemon will not come up on a fresh reboot. I have about 10 docker containers running in several docker-compose projects. If I shut down the compose projects before rebooting the Pi (its a Pi4) then everything is working fine and the daemon is coming up.

I read in another forum, that this should be fixed in a newer dhcpcd Version and it is a known bug. So dhcpcd fails if there are too many network interfaces.

Perhaps somebody has a workaround or even a fix for that.

jwillmer commented 3 years ago

@alaub81 do you have a link to that issue so that I can track the progress?

alaub81 commented 3 years ago

@jwillmer I just read it here: forums.gentoo.org

FP2K-Minske commented 3 years ago

Fastest workaround: sudo nano /etc/dhcpcd.conf Insert the following line at the end: denyinterfaces veth*

It exclude the virtual container interfaces from dhcpcd.

alaub81 commented 3 years ago

@FP2K-Minske thank you, just tried it right now and it seems to work :-)

theunreal89 commented 3 years ago

I had the same exact issue with Docker and denying veth interfaces solved the issue for me! Much appreciated ;)

daniel-asilva commented 3 years ago

Here to confirm that the FP2K's workaround also worked for me. I was tracking this problem and I can say this solution also works.

In addition, I was observing that this problem occurs every time the DHCP lease duration expires (4h by default). So dhcpd service crashes and the Raspberry Pi became offline but stays powered on. I have the exact same scenario: multiples docker interfaces. Fortunately the two mentioned workarounds above fix this thing.

EDIT: After @cpannwitz's comment below, I have to clarify. I've tested both solutions I mentioned above individually. I didn't applied both simultaneously.

cpannwitz commented 3 years ago

In Addition to the fix by @FP2K-Minske and @daniel-asilva (both fixes applied), I had to restart daemon and dhcpcd:

sudo systemctl daemon-reload
sudo systemctl restart dhcpcd

afterwards, because there were complaints about changed conf files on disk, which resulted in the same problem, dhcpcd not working after reboot.

EDIT: In my case, applying BOTH fixes (see above) did NOT work. I had to remove the fix posted by @daniel-asilva , and had to move denyinterfaces veth* to the top of the /etc/dhcpcd.conf file.

moracabanas commented 2 years ago

This happened to me from the time I set wlan0 static IP from GUI.

I see it happening the same thing from this thread https://raspberrypi.stackexchange.com/questions/58809/rpi-loses-its-wlan0-configuration-when-any-docker-container-is-started/117381#117381

It was solved disabling DHCP for virtual interfaces with the denyinterfaces veth* trick on /etc/dhcpcd.conf. Make sure you add it to the top of the file and reboo. Otherwise it won't work.

But I suspect when you set IP only, it looks for the resto configuration over all networks including veth new ones.

I will try to confirm this issue setting all static config requested on GUI and see what happens.

This is so crazy I was getting TLS and socket resset errors in my stack, and I was thinking for a month it was my stack issue.

d-rez commented 2 years ago

and I was thinking for a month it was my stack issue.

Just for a month? :P I've been having network crashes (same symptoms, docker swarm cluster) for well over a year, and going back even to raspberry pi 3 kernels and I could never pinpoint this in any way. System logs were very ambiguous and since I run them headless I always assumed they crashed until I realised I could still access them over another address/IP protocol (I run IPv6 and 2x attached VLANs on each raspi so technically 4x addresses per each Pi - 2x IPv4, 1x IPv6 public and 1xIPv6 ULA)

If this actually works I'll be ecstatic :D Thanks for posting this workaround folks!

seamusdemora commented 2 years ago

This happened to me from the time I set wlan0 static IP from GUI.

I see it happening the same thing from this thread https://raspberrypi.stackexchange.com/questions/58809/rpi-loses-its-wlan0-configuration-when-any-docker-container-is-started/117381#117381

It was solved disabling DHCP for virtual interfaces with the denyinterfaces veth* trick on /etc/dhcpcd.conf. Make sure you add it to the top of the file and reboo. Otherwise it won't work.

There was another Q posted to RPi recently that involved strange issues with docker services. I don't use docker services, and would have ignored the question except that the title of the Q implied network issues. I eventually gave a rather elaborate and tutorial answer that was primarily to make this point: Do not use dhcpcd's static ip option.

This shouldn't be controversial (or so I thought) as the author of dhcpcd says in man dhcpcd.conf:

For IPv4, you should use the inform ipaddress option instead of setting a static address.

The OP didn't provide any feedback; I don't know if he resolved his issue or not. But I ran across this thread, and wanted to ask a question, hopefully to get some feedback.

In the first line of this quote, it seems that you are using the static ip option, and so my question is this: Instead of static ip, have you tried either the request or inform options? If so, did that have any effect on the docker issues?

ykun91 commented 2 years ago

Encountered same problem and after a week of searching, I finally found the answer here. seems that this issue is still not get fixed.

I have two raspberry 4 with Raspberry Pi OS(64 bit) installed, have docker running in both, and both lost it ethernet connection after 2~3 days of poweron. and I have to manually reboot it every time to get it recovery...

pi@rasp-2:log $ uname -a
Linux rasp-2 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux
Jul 23 15:16:28 rasp-2 dhcpcd[721]: veth4bd92c9: soliciting an IPv6 router
Jul 23 15:16:28 rasp-2 dhcpcd[721]: veth73f0901: waiting for carrier
Jul 23 15:16:28 rasp-2 dhcpcd[721]: vethb3ac390: soliciting a DHCP lease
Jul 23 15:16:28 rasp-2 dhcpcd[721]: veth91f9038: soliciting a DHCP lease
Jul 23 15:16:28 rasp-2 dhcpcd[721]: veth576f779: waiting for carrier
Jul 23 15:16:28 rasp-2 dhcpcd[721]: route socket overflowed - learning interface state
Jul 23 15:16:28 rasp-2 dhcpcd[721]: veth7777561: carrier acquired
eric-pierce commented 2 years ago

I'd like to note that this is still an active issue, and the fix mentioned in https://github.com/raspberrypi/linux/issues/4092/#issuecomment-774512217 does resolve it. I wish it hadn't taken me weeks to find this thread, but very happy I did. I'm curious if this also persists on alternate distros like DietPi

bfren commented 2 years ago

Same here - thanks @FP2K-Minske - doesn't even feel very hacky - simply telling dhcpcd not to do something that it probably sensibly tries to do by default.

denwald commented 1 year ago

Been experiencing a similar problem too. Headless Raspberry Pi 4 with 10 docker containers (and corresponding veth* interfaces). Was regularly loosing connectivity on eth0 after a couple of weeks of uptime. Only suspicious thing I could find in the logs is the mentioned "DHCPCD route socket overflowed" message.

I will try the "denyinterfaces" option for dhcpcd 🤞.

ferrarimarco commented 1 year ago

Maybe updating dhcpcd to a version >= 9.2.0 could also help. There are a few interesting notes in the changelog of that version that seem related...

The latest dhcpcd version available on Raspberry Pi OS is 8.1.2.

dulitz commented 1 year ago

Just wanted to register another vote to bring dhcpcd up to a version more recent than 2019, as there have been a lot of improvements since then.

bfren commented 1 year ago

I ended up using raspi-config to switch to NetworkManager on all my pis instead of dhcpd - not had a problem since.

doodlebro commented 1 year ago

This bug is especially strange since I don't use IPv6 anywhere, but it seems like 6+ independent containers create the conditions for overflow.

areksobiczewski commented 8 months ago

The issue is still perisistent. I realized that after enabling IPv6 in my network and going with docker ~10 containers. Initially I thought there's a DHCP issue on my router (too narrow DHCP lease time), but it's like in this thread - at some point dhcpd is giving up not renewing DHCP leases regardless of how the DHCP server is being configured. It took me several hours to debug the matter with dhcpd and find this tread. It's a very confusing kind of error!

Meanwhile, I'm using what @bfren has proposed above - using newtork-manager instead of dhcpd.

Dhcpd should be either updated or network-manager should be the default in the OS. Otherwise an user is going to be faced with strange networking issues that are hard to troubleshoot whenever wanting to do some more serious work with Pi and networking :-(

seamusdemora commented 8 months ago

@areksobiczewski , et al

Please note that issues with dhcpcd are likely impacted by the fact that the RPi powers-zat-bee decided some time ago to stick with an old, no-longer-maintained-upstream version of dhcpcd. That left all bug-fixes for dhcpcd as the responsibility of someone in the RPi organization - or maybe a volunteer?? At any rate - in my experience, no one seemed to give a rat's-a$$ if it was maintained or not.

That's not meant as criticism, but only as a plain statement of fact.

bfren commented 8 months ago

Indeed. The decision has been made, for whatever reasons (however good - presumably there are consequences to using later versions of dhcpd?). What I don't understand is why Network Manager isn't simply made the default - are there consequences to using it that I'm not aware of?

If not, given doing that would easily fix the strange and definitely hard to troubleshoot issues caused by having relatively few Docker containers, I don't see why it hasn't been done.

ferrarimarco commented 8 months ago

It seems that the release notes for the latest Raspberry Pi OS version (based on Bookworm), contain this line:

  * NetworkManager used instead of dhcpcd as networking interface; various changes made to networking plugin to support this
bfren commented 8 months ago

@ferrarimarco that is curious, when I used the Bookworm image to install a new Pi 5, the default was still dhcpd and I had to change it using raspi-config.

JS-E commented 4 months ago

Fastest workaround: sudo nano /etc/dhcpcd.conf Insert the following line at the end: denyinterfaces veth*

It exclude the virtual container interfaces from dhcpcd.

Really appreciate this, i've been tearing my hair out trying to figure out why this wasn't working and this fixed it. Thanks again! 👍