raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.09k stars 4.96k forks source link

The Pi automatically restarts but fail and all the associated network devices go down #5261

Open zhchaozhang opened 1 year ago

zhchaozhang commented 1 year ago

Describe the bug

The Pi automatically restarts but fail and all the associated network devices go down, it can be used only after being powered on and off again

Steps to reproduce the behaviour

This was an occasional occurrence in previous tests, but recently a certain occurrence was found: BLE Host received ADV from all surrounding Bluetooth devices and then only did data transfer using Bluetooth connection during stress tests

Device (s)

Raspberry Pi 3 Mod. B+

System

pi@raspberrypi:~ $ cat /etc/rpi-issue Raspberry Pi reference 2020-02-13 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 5f884374b6ac6e155330c58caa1fb7249b8badf1, stage4

pi@raspberrypi:~ $ vcgencmd version Apr 30 2021 13:47:07 Copyright (c) 2012 Broadcom version d7f29d96450abfc77cd6cf011af1faf1e03e5e56 (clean) (release) (start)

pi@raspberrypi:~ $ uname -a Linux raspberrypi 5.10.17-v7+ #1414 SMP Fri Apr 30 13:18:35 BST 2021 armv7l GNU/Linux

Logs

log.zip

Additional context

No response

pelwell commented 1 year ago

The kernel log includes the following line:

[    9.660014] SS1BTUSBM: loading out-of-tree module taints kernel.

From this I deduce that you are using an external USB Bluetooth dongle, and that you have a driver you either compiled yourself or got from a third party. If the crash happens under heavy Bluetooth load there are a number of possible causes:

  1. a bug in the Bluetooth dongle driver
  2. a bug in the Bluetooth stack
  3. a bug in the USB driver
  4. some other kernel bug
  5. a sudden under-voltage condition caused by a power supply that can't cope under the load.

The kernel is sufficiently mature (even in 5.10) that 2 and 3 are not so likely to give hard crashes, but it's not impossible. My instinct is that 1 or 5 are much more likely, and we can't help you with 1.

Do you have a good power supply (and cable from the the supply to the Pi)?

While not a fix, you may be able to reduce the impact of the failure by enabling the watchdog, which should cause the Pi to reboot instead of lock up. Just add dtparam=watchdog=on to /boot/config.txt and reboot.

zhchaozhang commented 1 year ago

May I ask if there is any logs where we can find the reason for the restart or where we can get more logs to help us analyze before the restart?

pelwell commented 1 year ago

The best way for you to record kernel errors is going to be with a serial cable attached to pins 6 (GND), 8 (TXD from the Pi) and 10 (RXD to the Pi) on the 40-pin header. Configure your terminal emulator (Putty, Minicom etc.) on your PC or second Pi to 115200 baud, put enable_uart=1 in /boot/config.txt, add ignore_loglevel to the end of /boot/cmdline.txt, and reboot - you should see the usual kernel messages, and eventually a login prompt. If there is any kernel message output it will appear in that serial console.

zhchaozhang commented 1 year ago

Hi

We have configured cmdline.txt and config.txt and set baud rate as 115200, then reboot the Pi, but we can not see the usual kernel messages. Could you please help us to check whether this configuration is correct?

zhchaozhang commented 1 year ago

Hi

I have changed the location of “enable_uart=1” and can get the log by serial com. I will do the same test to repeat the previous problem and capture the log through the serial port.

Thanks Zhongchao Zhang

zhchaozhang commented 1 year ago

Hi Zhongchao

I think we should also need to check the SS1 driver log, please enable this Macro in SS1BTUSB.c @.***

BR Zhuxian

zhchaozhang commented 1 year ago

Hi Zhuxian

I will enable it and try to repeat the issue with Weimeng.

Thanks Zhongchao Zhang

zhchaozhang commented 1 year ago

Hi

We use serial com to get the kernel log success ,and find that print: lan78xx 1-1.1.1:1.0 eth0: Failed to read stat ret = -110 Could you kindly help to check the kernel log?

Thanks Zhongchao Zhang

kernel.log system crash print when reboot

zhchaozhang commented 1 year ago

Hi

We met twice similar problems but has different logs. The first time all the networks are down ,and in serial com we got the print as below:

[20221208_17:31:54:752][68109.029232] lan78xx 1-1.1.1:1.0 eth0: Failed to read stat ret = -110 [20221208_17:31:55:813][68109.749205] mmc0: timeout waiting for hardware interrupt.

And the ARM is crashed, we must need to change SD card to reboot.

The second time all the networks are down ,and in serial com we got the print as below:

[13_00:31:20:430][396067.526026] cpu cpu0: dev_pm_opp_set_rate: failed to find current OPP for freq 4294967186 (-34) [13_00:31:21:504][396068.589101] raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -110 [13_00:31:22:539][396069.619118] hwmon hwmon1: Failed to get throttled (-110) [13_00:31:30:524][396077.299167] mmc0: timeout waiting for hardware interrupt.

And the ARM is not crashed, we can reboot it by power cycle.

Could you kindly help to check the kernel log, compare and analyze the problem?

Thanks Zhongchao Zhang

pelwell commented 1 year ago

The log shows that your out-of-tree Bluetooth driver is causing lots of warnings, which makes me suspicious that it's not well written.

Is the USB Bluetooth dongle being removed and reinserted? If not, that suggests there may be a USB power problem.

A power issue is also suggested by the fact that these two messages are adjacent and close together in time:

[20221208_17:31:54:752][68109.029232] lan78xx 1-1.1.1:1.0 eth0: Failed to read stat ret = -110
[20221208_17:31:55:813][68109.749205] mmc0: timeout waiting for hardware interrupt.

Which brings us back to a question in my first reply:

Do you have a good power supply (and cable from the the supply to the Pi)?

zhchaozhang commented 1 year ago

Picture1 Picture2 Picture3 Picture4

We have used these two power supplies, and the problem of automatic reboot occurred in both of them. May I ask if you have any recommended power supplies?

pelwell commented 1 year ago

I can't tell whether the other supply is a proper power supply or charger (which we don't recommend), but the Raspberry Pi power supply is good. However, even with the best power supply, the Pi can't power more than a few low-power USB devices (mouse & keyboard, say). We recommend the use of powered hubs for all but the smallest setups.

zhchaozhang commented 1 year ago

watchdog As your suggestion, we config watchdog to reduce the impact of the failure, but we find that the watchdog doesn't work. Could you kindly help to check if the config is right, or where we can check if the watchdog enables successfully?

zhchaozhang commented 1 year ago

Hi May I ask if there is really a problem with USB power supply, then why PI will restart or even crush? Shouldn't it be the lack of power supply that causes the problems of BT device?