rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 655 forks source link

RancherOS 1.5.0 drops WiFi connection after a while on RPI3B using builtin WiFi #2668

Closed mortenlj closed 5 years ago

mortenlj commented 5 years ago

RancherOS Version: (ros os version) v1.5.0

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) Raspbrry Pi 3 Model B

I have created a custom image from the 1.5.0 image where I add firmware as commented in #1149. I add all the files under /lib/firmware/brcm, even though some of them might not be needed. I make a few other modifications, most notably adding a cloud-config.yaml and changing config.txt to set enable_uart=0.

This seems to work fine, the Pi boots and after a minute or two connects to the WiFi and is available for SSH. After a while (as little as 5 minutes, as much as 12 hours) the WiFi connection is dropped and does not return.

I have tried HypriotOS, which does not suffer similar problems, so it seems to be a problem with RancherOS (or my customizations), but I have not been able to find out exactly what is happening. I don't have an extra monitor, so my only access is via SSH, which naturally drops when the connection drops. After a reboot, I can't find any logs that tell me what happened (but I may just be looking in the wrong place).

Primarily, I need help to find out how to debug this problem. Once it is identified, finding solutions will be the next step.

mortenlj commented 5 years ago

After more digging, I finally found a hint in the logs:

Feb 23 19:53:29 rpi3b01 dhcpcd[1138]: wlan0: carrier lost
Feb 23 19:53:30 rpi3b01 dhcpcd[1138]: wlan0: deleting default route via 192.168.1.1
Feb 23 19:53:30 rpi3b01 dhcpcd[1138]: wlan0: deleting route to 192.168.1.0/24

More googling leads me to this thread in the Raspberry Pi forums: https://www.raspberrypi.org/forums/viewtopic.php?t=196793

According to this thread, the power_save feature is enabled in the onboard wifi, which can cause the wifi to be powered down. The "carrier lost" message will typically show up in this case. https://www.raspberrypi.org/forums/viewtopic.php?t=196793

My problem now is that the commands given to check the power-save state and change it are not available when I log onto my RPI, so I can't verify if this is the problem. Another worry is that I can't find any trace of any changes to the power-save state in the latest HypriotOS image, but I don't see this problem when running Hypriot.

Anyone have any ideas on how to proceed?

mortenlj commented 5 years ago

After more careful reading of the above posts I found that the command to check the power-state is iwconfig and not iw. It seems to be the case that power management is enabled, even though the log several times states that power management is disabled.

Now I just need to find out how to disable it on every boot...

niusmallnan commented 5 years ago

Try iwconfig wlan0 power off via ros running-commands, https://rancher.com/docs/os/v1.x/en/installation/configuration/running-commands/

mortenlj commented 5 years ago

Try iwconfig wlan0 power off via ros running-commands, https://rancher.com/docs/os/v1.x/en/installation/configuration/running-commands/

Thanks for the tip. I have tried this now, and it is looking good. It has now been running without problems for close to 2 hours, and while I have had a couple "runs" that lasted longer than that, most of the time the connection dropped within the first 20 minutes or so.

I will let it run for a couple days, and if the problem is gone I will close the issue.

mortenlj commented 5 years ago

This seems to solve the problem with the power management of the wifi, but since you have added the documentation label I guess you want to keep the issue open until it's documented, so I won't close it.

kingsd041 commented 5 years ago

After executing iwconfig wlan0 power off in rancheros, it has been running for 16 hours without problems. We marked this operation in the latest documentation.

@mortenlj Thank you for your feedback