raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.01k stars 4.95k forks source link

Issue with Kworker and load average growing #3995

Open tekman54190 opened 3 years ago

tekman54190 commented 3 years ago

Hello team,

I'm strunggling with a recxurrent issue on my raspberry pi 4. This is a system up to date running mysql DB, apache and Jeedom for Home automation. The RPI is booting over SSD, and has 3 USB connections: -> One aeotec Zwave dongle -> One Zigbee Conbee II dongle -> One UPS Greencell

At least once a day I notice some messages on the dmesg output about a Firmware transaction output. I tried to update the kernel to latest version but same issue. I'm 99% sure that when this message occurs, the load average of the system is growing drastically, inducing latency on the home automation system.

The messages in the dmesg output in attachment; dmesg.log

I've any clue on how to troubleshoot the source of the issue, if it's linked to a driver, a faulty hardware or whatever. Can someone help me to find the root cause of this messages?

Thanks for your great work.

pelwell commented 3 years ago

How are you powering this Pi 4 and its USB devices?

tekman54190 commented 3 years ago

Hi Phil, -> The aeotec and Zigbee are connected to a USB 2.0 HUB -> The UPS is a UPS so it's just the state of the UPS using nut -> The SSD is self powered using the docking station power supply.

The RPI 4 is powered with a 5V 3A power supply and a quality USB cable. I did not notice any under-voltage detected message like it was on the RPI3 with cheap stuffs :)

pelwell commented 3 years ago

I'm going to assume the USB 2 hub is powered. It sounds like an unofficial power supply (which isn't a problem in itself, it just means it's an unknown quantity).

When the system gets into this state, try the following commands:

$ vcgencmd commands

This should return a list of supported commands, which tells us that one of the interfaces to the firmware is alive.

$ /opt/vc/bin/vcmailbox 0x38041 8 8 130 1

This will turn off the red PWR led - turn it back on with the same command again but change the final 1 to a 0. Assuming that works, we know that both of the interfaces to the firmware are still working. That tells us that the timeout errors in the log are caused by some transient problem rather than a complete firmware crash.

You might get some useful information from sudo vcdbg log msg and sudo vcdbg log assert (it's awkward capturing the output from these commands because the log appears on stderr not stdout - I use sudo vcdbg log msg >& vcdbg.txt, etc.

tekman54190 commented 3 years ago

Ok I'll do this but for your information I supposed that monitoring was an issue. With Zabbix I was sending vcgencmd commands for voltage, core freq., and temperature. When the system is in this state, vcgencmd is stuck and never stop.

` root@assel-rpi-domotique:~# vcgencmd commands

^C^C^C^C^C ` and : root@assel-rpi-domotique:~# vcdbg log msg Unable to determine the value of LOG_START Unable to read logging_header from 0x00000000 root@assel-rpi-domotique:~# root@assel-rpi-domotique:~# root@assel-rpi-domotique:~# vcdbg log assert Unable to determine the value of LOG_START Unable to read logging_header from 0x00000000 root@assel-rpi-domotique:~#

If I redirect the stderr, in the file I've the same as above.

And yes it's not an official power supply as you mention. I can try to change it and change the power cables if you think it's a solution.

Keep me updated.

tekman54190 commented 3 years ago

I just change the power supply and the cable we will see if it's better.

pelwell commented 3 years ago

With Zabbix I was sending vcgencmd commands for voltage, core freq., and temperature. When the system is in this state, vcgencmd is stuck and never stop.

That does suggest that the firmware is crashing. I can't guarantee that it's a power issue, but if you are able to try a different supply and cable (bad cables can be a real problem) it would be useful.

tekman54190 commented 3 years ago

Sure I just changed the power supply and the cable with end to end USB-C instead of USB to USB-C to have more power stability. I'll come back to you in 2 or 3 days, normally this occur at least once a day ;)

tekman54190 commented 3 years ago

The change of power supply and cable did not help...

Had the issue 1 hour after the boot.

The log is in attachment. logs2.log

yoyojacky commented 3 years ago

did you update your firmware by following commands?

sudo apt update 
sudo apt full-upgrade
sudo rpi-update 

and it is up2date ? I met some usb boot issue due to logitech wireless 2.4G is not compatible with my USB booting SSD drive.

tekman54190 commented 3 years ago

Yes this is a test I made in the past without more luck. I reverted to latest stable firmware, since yesterday I allocated gpu_mem to 32M instead of 16 as I had read in a separated article, and since yesterday 3PM no more issue... And for your information I already disabled BT and Wifi in the config file :)

tekman54190 commented 3 years ago

Just to let you know that increasing the GPU memory seams to fix the issue... Did not really understand why but no more trouble since then...