raspberrypi / bookworm-feedback

13 stars 1 forks source link

pcmanfm often freezes when HDMI monitor is switched off and back on #281

Open Swift42 opened 1 month ago

Swift42 commented 1 month ago

I already wrote it as a comment here https://github.com/raspberrypi/bookworm-feedback/issues/57 , but the issue was already closed, so I'm opening a new issue.

I have a similar (or the same?) problem with pcmanfm version 1.3.2-1+rpt20 on my Raspberry 5 running Bookworm. My Raspberry runs 24/7 and I switch off the HDMI monitor overnight. Nearly every morning, when I switch the monitor back on, I have the same problem: The desktop background is not showing the bg image, but is suddenly black. Also pcmanfm is at 100% cpu (but has a normal memory consumption of 1.2%). The only way to get all running again is a sudo killall -9 pcmanfm After issuing the command, the bg image is instantly shown again and pcmanfm is usable again. Is there something I can do to fix this problem?

popcornmix commented 1 month ago

Is leaving it overnight necessary to show the issue, or can you demonstrate the issue just by switching the monitor off then on?

Do you get the issue on a clean install on RPiOS (e.g. on a separate sdcard for testing)? That will be useful to confirm if the issue relied on some custom settings or installed packages.

Swift42 commented 1 month ago

After a full night the chance that the problem happens is maybe ~40%. To see if it is a problem with the duration of the switched off monitor, I just tried to switch off the monitor 20 times, but for only 1 minute each. Result: Same failure rate of ~40% (9x failure, 11x ok). So it has nothing to do with the duration, but with the system going into HDMI sleep (or coming back from it). I have a more or less clean install (as far as I remember I only installed one custom package, the shell editor joe). Some observations: 1) After a reboot, the problem is completely gone at first. I can switch off/on the monitor several times without a problem. But after some days it happens again and then on a regular basis. 2) When I switch on the monitor and the system is in the "pcmanfm can freeze" state, a message is instantly shown that my Pi is connected via Ethernet cable and the IP address (this message is normally displayed only immediately after a reboot). But when I did a reboot and THEN switch off/on the monitor, this message doesn't show up. Strangely I have no problems with Ethernet, the connection always works, even when the monitor is off and even when the system is in the "pcmanfm can freeze" state.

I will do further tests to find out more.

Swift42 commented 1 month ago

As promised I did some tests: Indeed, after a fresh reboot the problem doesn't happen. I can switch the monitor off/on dozens of times without any problem. But after 2 or 3 days, it begins to happen again and then on a regular basis: I just need to switch off/on the monitor 1 or 2 times and it happens again (black desktop background, pcmanfm is at 100% cpu). Then I do a killall -9 pcmanfm, all is working again and after this I can immediately switch off/on the monitor and it happens again. So the system is in some special state now. Is there anything I can do to fix the problem? (besides running the killall every morning or do a fresh reboot)

popcornmix commented 1 month ago

But after 2 or 3 days, it begins to happen again and then on a regular basis:

To investigate it, we need a way of reliably reproducing it.

It's not clear if: Booting up. Waiting 3 days (without using pi). Powering off display. would provoke the issue (my suspicion is no). Or, perhaps that is some software you run rarely that makes the next power cycle fail (which seems more likely to me).

Swift42 commented 1 month ago

I will try it (boot, not using pi, powering off the display) and give feedback here. Can take some time. Btw, the only software that is running/active on my Pi is Chromium. When pcmanfm is at 100% cpu, chromium runs normally and is fully usable.

qrp73 commented 1 month ago

I notice that this issue happens also with browsers (like firefox/chromium-browser). So it's better to close browsers and pcmanfm processes before disconnect/connect hdmi cable, otherwise it may leads to crash.

Sometimes it can lead to wayfire crash and all desktop widnows will be closed and all unsaved data will be lost. So, make sure that all data is saved before connect/disconnect hdmi cable.

Swift42 commented 3 weeks ago

It's not clear if: Booting up. Waiting 3 days (without using pi). Powering off display. would provoke the issue (my suspicion is no). Or, perhaps that is some software you run rarely that makes the next power cycle fail (which seems more likely to me).

Ok, I tried it several times without starting a program after the boot/login. Sooner or later the problem WILL happen, even when no program was/is opened. tbh, I'm out of ideas and this problem is driving me crazy. Currently I use the following script to kill pcmanfm when it uses 100% cpu load (via crontab every minute), but this is of course not a good/permanent solution. Hopefully you can find/identify the problem on your side. The odd thing is: After the original OS installation I didn't have the problem. All was running fine for months. But after I installed some updates, the problem was suddenly there.

Here is the script I use (yes, it needs kill -9, a normal kill won't do anything):

#!/bin/bash
PID=$(pidof -s pcmanfm || cat /proc/sys/kernel/pid_max) 
if [[ $(top -b -n 2 -d 0.2 -p $PID | tail -1 | awk '{print $9}' | grep -Po '^\s*\K[0-9]+') -gt 99 ]]; then
echo Killing PID $PID
kill -9 $PID
fi
MarlinMr commented 1 week ago

I seem to be having the same issue with RPi5 Memory is consumed over aprox 8 hours. It originally looked to be connected to the new rpi-connect, which might explain why its "suddenly happening". rpi-connect uses wayland as mentioned in https://github.com/raspberrypi/bookworm-feedback/issues/57.

This thread on the forum discusses it.

qrp73 commented 1 week ago

If I remember correctly this issue is related to pipewire service. As I remember I tried to switch to pulseaudio and it fix the issue with crash on hdmi reconnect. But pulseaudio don't allows automatic select proper sample rate for sound card, so I switch back to pipewire. With pipewire it crashes on hdmi reconnect, but allows to play sound with native sample rate without unwanted resampling with artifacts.

Swift42 commented 1 week ago

If I remember correctly this issue is related to pipewire service. As I remember I tried to switch to pulseaudio and it fix the issue with crash on hdmi reconnect.

Very interesting. I will try to switch to pulseaudio and see if the problem happens again. It will take some days, but I will give feedback here.

popcornmix commented 1 week ago

As always, a reliable sequence of operations that provokes the bad behaviour will help a lot.

e.g. Start with RpiOS bookworm 64-bit desktop apt install [package] Launch [package] power cycle display Repeat N times Note memory usage from "free -h" has increase by X. After enough iterations pcmanfm crashes with OOM.

MarlinMr commented 1 week ago

Turning of the screen will cause the issue. Screens that are in "active standby" will cause HDMI to be connected and disconnected lots of times. Every time, it will eat a bit of memory.

I have a TCL 75'' QLED860 4K LED TV (2022) that's connected. The problem starts at once when the TV is put in standby. Memory graph

popcornmix commented 1 week ago

Please be explicit. Are you saying that: boot into RPiOS desktop 64-bit (default configuration of wayfire) with hdmi display on (4kp60 mode) With auto login enabled Run nothing, just having blank desktop displayed. Power off display. Leave display off for 24 hours. free -h will show ~6GB additional ram consumed.

MarlinMr commented 1 week ago

How to reproduce:

  1. Install the Raspberry Pi M.2 HAT+, and mount the RPi5 in a the case with fan. Make sure the fan is angled at -15 degrees as showed in the picture. image
  2. Connect a HDMI mini to HDMI dongle, and HDMI to a monitor. Make sure the monitor has some form of "active standby". image
  3. Insert an SD card with RPi OS installed.
  4. Install NVMe drive.
  5. Boot from the SD card, and use Pi Imager to install RPi OS on the NVMe drive.
  6. Switch to booting from the NVMe drive and remove the SD card.
  7. Boot the RPi5 from the NVMe drive.
  8. Run a custom set of scripts, databases, docker containers, or whatever else you need to have running.
  9. Turn on the monitor.
  10. Watch several streaming services on the monitor.
  11. Turn off the monitor, and go to sleep.
  12. Wake up, and the RPi5 will be consuming most if not all it's memory.
  13. Reboot the RPi5 by disconnecting power.

That's how I get it done. Some of these steps might not be necessary, however. And if you only want to reproduce the issue, you can:

  1. Connect (seemingly) any display with "active standby".
  2. Watch pcmanfm consume more and more memory.

You do not have to wait any amount of time. The problem does not arise once the memory is full. You do not need to turn on any screen. You do not have to care about any logins.

Swift42 commented 6 days ago

If I remember correctly this issue is related to pipewire service. As I remember I tried to switch to pulseaudio and it fix the issue with crash on hdmi reconnect.

Very interesting. I will try to switch to pulseaudio and see if the problem happens again. It will take some days, but I will give feedback here.

I tried to switch to pulseaudio - but sadly the problem persists. So pipewire wasn't the culprit. I will switch back to pipewire.

popcornmix commented 3 days ago

"active standby" for displays isn't a term that google is finding. Do you have a link that describes what "active standby" is?

tdewey-rpi commented 3 days ago

"active standby" for displays isn't a term that google is finding. Do you have a link that describes what "active standby" is?

Typically called 'Connected Standby', 'Fast TV Start' or similar branding names.

popcornmix commented 3 days ago

Screens that are in "active standby" will cause HDMI to be connected and disconnected lots of times. Every time, it will eat a bit of memory.

How are you determining this and do you know why?

Can you run:

while : ; do sudo cat /sys/kernel/debug/dri/?/hdmi0_regs | grep HOTPLUG; done | tee hotplug.log

and check what is reported when you do what provokes the memory issue (e.g. standby of TV).

The expected behaviour is HDMI_HOTPLUG = 0x00000001 until you enter standby. Then it may stay as 1, or go to 0. Then it may stay as 0, or go back to 1.

Continuously switching between 0 and 1 for a long period would be unexpected.

MarlinMr commented 3 days ago

I mean, switching between 0 and 1 isn't unexpected, that's how it works. No idea why it's implemented like that, but it is. The result was as expected, HDMI_HOTPLUG = 0x00000001 switched between 0 and 1. And every time state changed, pcmanfm ate a bit of memory... Keep in mind that, at least on my monitor, there are several minutes between every time it switches. And it only eats a few MB every time, so people don't notice until memory is full hours later.

But the "active standby" or whatever mode a display is in isn't the issue here. It's that pcmanfm eats memory every time a display change is done. The easiest way to replicate it really is just to disconnect the cable. Or turn of a connected screen. pcmanfm will eat a few MB every time.

popcornmix commented 3 days ago

The result was as expected, HDMI_HOTPLUG = 0x00000001 switched between 0 and 1. And every time state changed, pcmanfm ate a bit of memory... Keep in mind that, at least on my monitor, there are several minutes between every time it switches.

Just to confirm hotplug was initially 1, you entered standby and it went to 0. Then, are you saying that without otherwise interacting with display, you see a 1->0 switch periodically every several minutes, and that behaviour continues over many hours?

Swift42 commented 3 days ago

Just to be clear: I have no (noticeable) memory leak after HDMI on/off, but pcmanfm eats 100% of the cpu and some desktop functions are locked (e.g. the icons on the desktop don't show up). Also I cannot start a new pcmanfm window. After a forced kill of pcmanfm, all is working normally again.

MarlinMr commented 3 days ago

Just to confirm hotplug was initially 1, you entered standby and it went to 0. Then, are you saying that without otherwise interacting with display, you see a 1->0 switch periodically every several minutes, and that behaviour continues over many hours?

It was 0, 1 when in standby and periodically switches. And I can let it run over night and have an extensive log of that, but it seems a bit silly, because the "active standby" isn't the issue.

The issue is that pcmanfm eats memory every time HDMI changes. Here is a video of it happening: https://i.imgur.com/e5rfJ4r.mp4

Just to be clear: I have no (noticeable) memory leak after HDMI on/off

How much memory was it using before and after? Because I only see a few MB every time, and so it doesn't become an issue until having been switched on/off thousands of times.

popcornmix commented 3 days ago

I'm trying to understand the behaviour of your display (and note, that it is not normal) so I can reproduce.

I've hacked my kernel to appear as if the hotplug signal toggles every 10 seconds. It's hard to see over a single or couple of cycles, but after a number (here I've gone through 14 off/on cycles) I can see a trend upwards, which shouldn't happen. We'll investigate this.

Swift42 commented 3 days ago

Just to be clear: I have no (noticeable) memory leak after HDMI on/off

How much memory was it using before and after? Because I only see a few MB every time, and so it doesn't become an issue until having been switched on/off thousands of times.

This may be the case, but my original post was about 100% cpu and an unresponsive pcmanfm. Nevertheless, the two problems (100% cpu, unresponsive vs. memory leak) may be connected. (I think the chance for this is pretty high - if the root cause of the memory leak can be found it may also fix the unresponsive pcmanfm). Hopefully the bug can be found. It has definitely to do with HDMI sleep/off/on.

As I pointed out somewhere above, when my system is in the "special state" after some days, I can just switch off my monitor and switch it back on some seconds later and boom! pcmanfm is at 100% cpu and unresponsive (in ~50% of all cases). I think the problem doesn't happen when HDMI goes to sleep and it doesn't happen when HDMI is sleeping, but when HDMI wakes up. To be sure I verified it: When my script from above finds out that pcmanfm is using 100% cpu, it kills the task, but this time I added a log entry. Then I switched off the monitor several times, waited some minutes and switched it back on. Result: No log entry after power off or while in power off. All log entries happened directly AFTER the power on. So the problem always happens when HDMI wakes up from sleep. Maybe this helps.

popcornmix commented 2 days ago

I think the memory leak doesn't occur with labwc.

sudo apt install labwc sudo raspi-config

then choose "Advanced Options"/"Wayland"/"Labwc" and reboot.

Note, we will be making labwc the default (in place of wayfire) in the near future. Give it a try.

MarlinMr commented 1 day ago

Allright, I'll give that a try and report back.

Swift42 commented 1 day ago

I think the memory leak doesn't occur with labwc.

It will take some days, but I will report back, too. Fingers crossed.

MarlinMr commented 1 day ago

then choose "Advanced Options"/"Wayland"/"Labwc" and reboot.

Seems like this fixes the issue, yeah.

qrp73 commented 1 day ago

Wayfire is so cute, so I don't want to change to other, so I hope it will be fixed for Wayfire...

popcornmix commented 22 hours ago

Wayfire is so cute, so I don't want to change to other, so I hope it will be fixed for Wayfire...

Can you explain why? labwc will become the default in the near future (it has many performance and memory advantages), and at that point wayfire will get no pi-specific features or fixes.