Open leviport opened 2 years ago
Hey, this is Greg. Thank you for creating the issue in github. I'll clarify what I can.
Steps to reproduce: Leave PopOS running for some time. I've experienced this problem becoming noticeable in different timeframes. After a restart, it usually takes at least a full day to become noticeable, although I've also experienced it becoming a problem only after two days (although rare). I'm running in nvidia-only mode when I notice this, but I don't know if it happens when it's running integrated-only.
This is, indeed, a Kudu 6. This issue has been present since I performed a fresh install of PopOS 22.04 (I originally installed Ubuntu Mate as soon as my laptop arrived and did not experience this behavior. It was only after PopOS was installed). That's been about four months now.
I didn't make the connection to the system76-power until a couple days ago. Since it was audio-related, I've tried all manner of tweaks and adjustments to pipewire, but these didn't seem to have any noticeable impact. I've even disabled the system76-scheduler to see if that helped. I just happened to notice that the system76-power was relatively busy one night when the audio was degraded, so I thought I'd kill it and it instantly improved. This behavior also occurs when using a USB headset, regular wired headphones, or the speakers. I do lock my computer often throughout the day, and systemctl status system76-power shows a DBUS Received GetGraphics() method and GetProfile() method each time this happens, if that means anything useful.
Other notes: My laptop usually stays tethered to a Dell WD19S dock with two monitors, but the video I posted to twitter was done without the dock while running on battery. Still, the computer was on for the full day today, and the stuttering only started this evening. I haven't been observing system76-power long, but the highest I've ever seen it go was about 23% CPU according to htop. If there is something I can do to get more information about this process when it happens, I'll be very happy to test.
Here's a screenshot of an strace on system76-power just after a reboot. It's using between 0% and 0.7% CPU currently.
My computer has been left on overnight, locked. Just before I went to bed, system76-power was between 0.7% and 1.3%. This morning, it's bumped to between 2.0% and 4.6%. Systemd reports that's about 12 full hours.
I'm looking for a possible cause. Any difference in strace output after that day?
I have no idea what I'm looking for, so I don't know what's different. Here's a screenshot of what it looks like today.
Guide me on how I can get you more information.
Edit: system76-power is currently between 2.6% and 5.9% as of this strace screenshot, Current uptime is about 15 hours. systemd reports 18 minutes of CPU time for system76-power.
Maybe you've noticed it outputting something more frequently than before
I think it's safe to assume that it's not the fan daemon or hotplug detection since these are only executed once per second. I don't think strace will be useful. May need to run this through a proper profiling tool.
It looks like it's doing openat(AT_FDCWD for /sys/class/hwmon/hwmon 1-5, does some lseek and reads, a couple of clock_gettime, one write, an epoll_wait(3 , then a couple more clock_gettime. This seems to look the same as when I started it, but I'm not sure anything would stand out to me unless it looked significantly different. htop is polling the strace every second, and it looks like the all the stuff I listed above also happens every second.
Yeah it might be more helpful to get output from a flamegraph using a tool like this. It's most likely some application-level code.
Is this of any use?
No, doesn't contain any function calls
Here's one I ran for a couple seconds. Does this help, or still useless
No, no functions from system76-power itself
I'm running the command as sudo ~/.cargo/bin/flamegraph --bin=system76-power daemon
Should I be doing something different?
cargo flamegraph --root daemon
from the project directory with the system76-power daemon stopped in advance.
Is this heading in the right direction? This was generated from in the .cargo/bin directory with the command: flamegraph --root -- /usr/bin/system76-power daemon
No, it requires building system76-power from source code to have all the debugging symbols and function call information.
Oooooh, okay. Do you want me to try to do that? If so, what compilation command and flags should I use?
I'm really sorry. I'm still trying to figure out what I'm doing.
See if this is any better. I edited the makefile to enable debugging (I think) and compiled it. The binary went into the debug directory, so I'm assuming that got it.
I navigated to the debug directory and ran: flamegraph --root -- ./system76-power daemon
And got this graph
This is good. Let it run for several hours and compare the results tomorrow to see if something stands out more as time goes on.
You got it. Thank you. So sorry for the confusion
Left my computer running over night. system76-power was using about 5.6% at the time I stopped it. flamegraph collected about 17GB of perf data and took about an hour-and-a-half to generate the graph. Here it is.
Seems that it's entirely hotplug detection. Unusually high CPU demand from that codepath. Will investigate optimizations.
Okay, so let me add to that point. I'm using a WD19S USB-C dock with two monitors (one is connected by USB-C to the dock, the other is by display port). Now, I don't notice any issue with this set up, and the systemd status for system76-power doesn't show a hotplug event unless I unplug the dock and plug it back in. However, the very few times I use Discord, it will occasionally and randomly mention that a new Nvidia HDMI audio source was detected, and I never use that for audio. This may be a red herring, but it's the only anomaly I can think of that might be related at all. Please let me know if there is anything else I can do to assist.
Every second the service checks for display hotplug events. I'll try to reduce the amount of reads that it's performing.
Try running the perf
branch and see if that improves CPU usage.
This was from system76-power (perf branch) running over night. By the time I stopped flamegraph, it collected about 45GB of data. Edit: the system76-power process was using just about 6% CPU at the time I stopped it
Any update on this? I'm still experiencing this on my Kudu. I have to manually stop system76-power after booting up, and then because that's not running, I have to manually change to integrated/nvidia. It's not terrible, but it's very annoying to have to keep doing this, and I'm really hoping for a fix.
Distribution (run
cat /etc/os-release
): Pop!_OS 22.04Related Application and/or Package Version (run
apt policy $PACKAGE NAME
): Presumablysystem76-power
, version unknownIssue/Bug Description: https://twitter.com/emptythevoid/status/1570150727300923393
Steps to reproduce (if you know): Not entirely clear, but presumably the machine just has to be left running with music playing/looping in the background for a while? The time frame to reproduce the bug isn't clear to me. It looks like it happens pretty quickly in that video, but prior uptime is unknown.
Expected behavior: system76-power should not hog CPU, and audio playback should not stutter
Other Notes: I assume this is on a kudu6, judging by the core count in
powertop
in the video.