Closed mstekker closed 4 years ago
What do you have plugged in to the USB ports, or anywhere else?
This appears to be memory corruption, which invariably has one of two causes - a bad driver, or a low voltage supply. Answering James's question will help to narrow down the former. For the latter, while (or after) running a typical workload, run vcgencmd get_throttled
and report the output.
Thanks for your answers. The UBS dongles are:
Bus 001 Device 004: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode) Bus 001 Device 005: ID 148f:5572 Ralink Technology, Corp. RT5572 Wireless Adapter Bus 001 Device 007: ID 0451:16ae Texas Instruments, Inc. (This is a CC2531 ZigBee USB dongle)
Integrated Bluetooth and WiFi are disabled (config.txt / overlays)
vcgencmd get_throttled result: throttled=0x80000
No voltage warnings in syslog.
No voltage warnings in syslog.
That's no guarantee that voltage isn't an issue if the load changes rapidly - this many dongles should be on a powered hub - but let's assume for now that it isn't. Is it feasible to remove the devices one at a time and test the stability of the remaining system, or does that reduce the functionality so much as to completely break your use cases?
Hi pelwell,
In the mean time I have tried a powered hub. But with no result. It appears that the issue/exception is not happening with the CC2531 dongle disconnected. Even when the other devices are directly connected to the PI.
If the CC2531 is connected but no software is polling its data, the system also runs very stable. So I am very sure the CC2531 is causing the troubles - which for me is an issue because the unit should monitor ZigBee devices (for example Philips HUE).
For monitoring ZigBee I use the "KillerBee" Python framework which on it's turn is using "PyUSB" to poll the CC2531. PyUSB is using / is a wrapper for libusb. So I am afraid there could be an issue in this chain of use regarding to the kernel crash.
That's a useful result. We can't spend much time looking into an individual driver that (so far) only affects one person (that we know of), but if you can narrow down the issue any further we can see if there are any changes to the driver in newer kernels that migjt help.
@mstekker Did you have any further results or find any solutions?
Hi @pelwell, I was wondering if you could tell me why a low voltage supply may cause this problem? I also get this problem during power on/off testing too.
Because below a certain voltage the internal logic of the SoC starts to fail because the operations don't complete in time for the next clock tick.
@mstekker Did you have any further results or find any solutions? Hi JamesH65,
After testing with several ac adapters, several Raspberries, several active USB hubs, several CC2531 sticks and the newest raspbian kernel updates the problem still exists as today. An Intel NUC with Debian works fine, so I am stuck regarding not enough knowledge about the kernel.
Best regards,
Mart
Hi, After updating to the newest kernel version, same hardware, the log stays clean from errors and raspbian is working fine for several days now. I do not know what changed, but the problem is solved as by today with the latest kernel / firmware.
I suspect that a change in dynamic power management in the firmware has fixed it for you.
I suspect that a change in dynamic power management in the firmware has fixed it for you.
Thanks for the information!
for me was because i had the wrong image to be honest, mybad
Hi,
I am getting a "Unable to handle kernel paging request" every several hours. The system reboots after the message but stays frozen. Only power off / on brings it up.
Thanks for any clues.
Regards,
Mart