Open Massimo-B opened 4 months ago
For higher loads I tried to increase quantum. Sometimes that helped in the past while accepting a higher latency, but with easyeffects it seems to get worse after I trying to tinker with quantum like
pw-metadata -n settings 0 clock.force-quantum 1024
or
pw-metadata -n settings 0 clock.force-quantum 2048
The error column in pw-top
indicate xrun errors. In most cases a xrun means that the soundcard is not receiving audio buffers as fast as it would like. In our case this means that for some reason PipeWire is not being able to do it. PipeWire/Pulseaudio apps send buffers to the server and not directly to the soundcard. So for some reason the additional load in PipeWire's realtime thread is being too much for it to handle on your system.
There isn't really much that can be done from the client side. The obvious solution would be to make the plugins code super fast. But besides the fact most of the ones we use come from third party projects they are probably already as optimized as they can be.
Something to try would be to change some server configuration options like ALSA Headroom
or changing kernel versions. This has helped some users in the past. It is also worth to identify which plugins contribute the most to the xrun errors and avoiding using them in this machine.
As xrun errors depending strongly on the hardware it is hard to suggest a direct solution. For example on my current desktop even putting all the cores of my Ryzen 7700
under 100% stress test I have zero errors in pw-top while watching youtube videos on Firefox.
Something I have been doing for years is booting with the kernel option threadirqs
. Maybe this is helping somehow to play audio at higher loads. Does it make any difference?
Hi!
There isn't really much that can be done from the client side. The obvious solution would be to make the plugins code super fast. But besides the fact most of the ones we use come from third party projects they are probably already as optimized as they can be.
I've looked at spectrum code as that is the plugin with the most xrun and is implemented by EasyEffects.
What would you think about moving most of the processing to outside the realtime thread? Currently it does the DFT inside the RT thread and hands off a mono double
buffer when done.
The idea would be that it would do the left+right average, put that to the end of a buffer and be done for the realtime thread. When the spectrum needs to be rendered, then this is copied and worked on from the thread doing the rendering.
Note: one issue of the current approach is that the DFT is done per process event. This might be much more frequent than needed if the PW quantum is really small. In that case, we would reduce work done in the RT thread but also overall CPU load by avoiding work.
Do you have any thoughts on that? I'll be able to work on that.
A tengant: how expensive are the util::idle_add()
calls? There is one at the end of setup()
(called when sample rate or quantum changes) and one at the end of process()
. I see an allocation then a call to g_idle_add()
. That call, I do not know about.
What would you think about moving most of the processing to outside the realtime thread?
I updated our master branch now moving the fft call to the main thread. Let's see if this helps weaker processors to handle the extra load. The thing is that all this load was already avoided when the window was hidden. So unless all those people having xruns are with EE window always opened I do not expect much change.
A tengant: how expensive are the util::idle_add() calls? There is one at the end of setup() (called when sample rate or quantum changes) and one at the end of process(). I see an allocation then a call to g_idle_add(). That call, I do not know about.
g_idle_add
schedules the execution of a function in glib/gtk main thread.
Ah, nice! Thanks.
g_idle_add
schedules the execution of a function in glib/gtk main thread.
This sounds like something that requires synchronization and might even be blocking. Nothing great in the audio processing codepath. I see it gets used by most plugins to export results.
Ah, nice! Thanks.
g_idle_add
schedules the execution of a function in glib/gtk main thread.This sounds like something that requires synchronization and might even be blocking. Nothing great in the audio processing codepath. I see it gets used by most plugins to export results.
It does not require blocking or synchronization. And we must use it because of the usual requirement from graphical toolkits about not having other threads messing with widgets. Sooner or later a move to the main thread will have to be done.
It does not require blocking or synchronization.
The underlying implementation is idle_add_full
(code). I'm counting two malloc calls in idle_source_new
, one in g_source_set_callback
and a mutex lock in g_source_attach
.
I'm wondering if putting data in the plugin and letting the main thread access it, without creating a new idle task on each process()
call, wouldn't be more efficient? I'd be down to attempt a proof-of-concept if you want.
I'm wondering if putting data in the plugin and letting the main thread access it, without creating a new idle task on each process() call, wouldn't be more efficient? I'd be down to attempt a proof-of-concept if you want.
Besides the fact you would have to rewrite considerable amounts of EE code because what you propose is the opposite of what is done everywhere in EasyEffects code I am skeptical about the performance gains being worth of such a radical change. Looking at perf top
output the calls to g_idle_add are not a bottleneck.
And like I said before when the window is hidden (EE in background) none of this code is operational. But some machines will still have xrun. So it is unlikely that the calls to g_idle_add are the source of problem.
I'm wondering if putting data in the plugin and letting the main thread access it, without creating a new idle task on each process() call, wouldn't be more efficient?
This would probably require to put the main thread in some kind of polling mode. Possibly involving the insertion of our own mutex objects because now we have to worry about what is being done to the data structures inside the plugins. It does not seem very appealing.
Besides the fact you would have to rewrite considerable amounts of EE code because what you propose is the opposite of what is done everywhere in EasyEffects code I am skeptical about the performance gains being worth of such a radical change. Looking at
perf top
output the calls to g_idle_add are not a bottleneck.
Ah I just looked at spectrum and noticed the above remarks. I was thinking about contributing small improvements without aiming for a major rework. Spectrum sounded especially interesting as it runs on all GUI EE instances by default.
About perf top
: the goal is not really lower CPU usage, but more predictable timings in the realtime thread. Lower CPU usage is a nice side-effect. That is why I am using pw-profiler
ran during a few seconds to see what the spectrum PW node timings look like.
And like I said before when the window is hidden (EE in background) none of this code is operational. But some machines will still have xrun. So it is unlikely that the calls to g_idle_add are the source of problem.
From experiments, the Spectrum::process
callback is still being ran ie the PW node is in running state. Is that not expected? I see the code that early returns on the Spectrum::power
signal if the chart is not visible, but I see no code that disables the PW node when hidden.
This would probably require to put the main thread in some kind of polling mode. Possibly involving the insertion of our own mutex objects because now we have to worry about what is being done to the data structures inside the plugins. It does not seem very appealing.
I was thinking more of an atomic pointer to A/B data buffers. This means no locking, but is more complex.
I wrote a proof-of-concept as 4 commits located on this branch. Check out commit messages for context. I haven't touched data_mutex
which might be unused, I'm not sure; it is the last big source of timing uncertainty in Spectrum::process
. The before/after is something like this (pw-profiler
output, 128 quantum, ran 30 seconds):
Before After
process calls 12022 12165
avg (µs) 120.5 45.0
stdev 20.6 17.3
min 0 0
5th perc 107 31
25th perc 112 35
median 117 40
75th perc 127 47
95th perc 152 81
max 294 165
This is a PoC because I need to (1) sleep and (2) re-read that atomic code! One small, last, note: I see how my previous messages could be read as rather arrogant, from someone spawning out of nowhere and writing down critics. My desire is only to contribute to a project I enjoy using. What I want to say is, thanks for your work!
Anyway, if you have feedback, maybe see potential roadblocks for this stuff, I'd be happy to ear.
From experiments, the Spectrum::process callback is still being ran ie the PW node is in running state. Is that not expected? I see the code that early returns on the Spectrum::power signal if the chart is not visible, but I see no code that disables the PW node when hidden.
The callback is still called because the plugin is still in the pipeline. But g_idle_add
is not invoked if the window is not visible. The only thing that happens is the copy of the buffer received at the input ports to the output ports because the spectrum plugin is put in bypass mode when the window is closed https://github.com/wwmm/easyeffects/blob/36adff90d587f1dda12f1b943f90077dd81e97e0/src/effects_box.cpp#L498. In other words the spectrum is in passthrough mode and none of those computations should be being executed unless some weird regression happened.
One small, last, note: I see how my previous messages could be read as rather arrogant, from someone spawning out of nowhere and writing down critics. My desire is only to contribute to a project I enjoy using. What I want to say is, thanks for your work!
No offense taken :-). It just seem to me that it is too much effort for too little gain that is going to become unnoticeable in front of how CPU hungry some of the plugins are. Specially when we consider that unless the window is opened none of the work you are doing will take effect. And when the windows is visible the load that the spectrum drawing brings will matter more to the overall CPU usage and possible indirect effects over the realtime thread execution than using or not a c++ deque in the spectrum calculation for example.
It is nice to make things faster as long the code does not become much more complicated. But it is hard to see those microseconds gained in pw-top estimations helping a hardware that is struggling to handle heavy plugins like the bass enhancer, crystalizer, multiband compressor, autogain, etc.
At the same time it is obviously not going to do harm. If you are having fun go ahead :-)
The callback is still called because the plugin is still in the pipeline. But
g_idle_add
is not invoked if the window is not visible. The only thing that happens is the copy of the buffer received at the input ports to the output ports because the spectrum plugin is put in bypass mode when the window is closedhttps://github.com/wwmm/easyeffects/blob/36adff90d587f1dda12f1b943f90077dd81e97e0/src/effects_box.cpp#L498 . In other words the spectrum is in passthrough mode and none of those computations should be being executed unless some weird regression happened.
I cannot see bypass
change so on my setup (Wayland, Sway) all the code is executed all the time. spectrum_chart:hide
signal is not called, nor the src/effects_box.cpp:dispose
function.
Indeed this should be a priority, with work on top is only useful when the window is visible. I'll have a deeper look.
No offense taken :-). It just seem to me that it is too much effort for too little gain that is going to become unnoticeable in front of how CPU hungry some of the plugins are.
Well, this issue shows someone with many many xruns on spectrum, much more than on the other plugins running (output_level, rnnoise, echo_canceller, speex, filter, bass_enhancer, maximizer, crystalizer, exciter, stereo_tools, delay
).
This could be explained by timing spikes. Some other plugins take more time on average but their high percentile is pretty close to the average. Spectrum appears to have a low average with a very high percentile. That's my only explanation for the behavior observed. I've tried, but I have not managed to reproduce (even with high loadavg ie while compiling).
And when the windows is visible the load that the spectrum drawing brings will matter more to the overall CPU usage and possible indirect effects over the realtime thread execution than using or not a c++ deque in the spectrum calculation for example.
Patches precompute Hann window
and replace input deque by a simpler vector
reduce work required (whether it is on realtime or GUI thread). Patch move all computations and drawing to a tick callback
moves this work from realtime to GUI thread. If it does affect CPU load, it should reduce it by avoiding allocations and scheduling into the glib event loop.
I don't see how the patches can bring more overall CPU usage? And possible indirect effects over the realtime thread? There might be implications I am not aware of?
It is nice to make things faster as long the code does not become much more complicated. But it is hard to see those microseconds gained in pw-top estimations helping a hardware that is struggling to handle heavy plugins like the bass enhancer, crystalizer, multiband compressor, autogain, etc.
Well, it is microseconds on my setup but milliseconds on other setups. I do have a top-of-the-line workstation. The issue author spends 1.4ms in the realtime thread to handle one quantum.
I want to rework the last patch to make the sychronization between realtime and GUI thread more straight forward. I agree the sent patches are more complicated.
At the same time it is obviously not going to do harm. If you are having fun go ahead :-)
:-)
Thank you all for your efforts trying to reduce the load and optimize the code. After the spectrum had the most xruns I immediately disabled spectrum, as I don't need it. I still had the xruns. I usually only use easyeffects for my microphone input in order to optimize the quality for conference calls.
As you see some of the used modules belong to the cpu-expensive ones. Here is what I use:
BassEnhancer, Exciter, Crytalizer and just nice to have, no requirement, these might be the most expensive and I could drop them. StereoTools I only need for quickly making the Left channel being the main mono, as my mic reports being a stereo by mistake while only the left channel is the amplified signal. With that setup, the easyeffects process on an idle old i7-3770 Ivy Bridge has only about ~1% CPU, absolutely affordable. Other machines have a i7-4790 Haswell, not much better...
As xrun errors depending strongly on the hardware it is hard to suggest a direct solution. For example on my current desktop even putting all the cores of my
Ryzen 7700
under 100% stress test I have zero errors in pw-top while watching youtube videos on Firefox.
Interesting. Right now I need to stop CPU-intensive tasks and IO for conference calls.
Something I have been doing for years is booting with the kernel option threadirqs. Maybe this is helping somehow to play audio at higher loads. Does it make any difference?
Interesting, going to try that soon. Any way to enable that in the kernel conf or at runtime? Searching for that option I found CONFIG_IRQ_FORCED_THREADING, is that the same? That's enabled in my current kernel.
I'm currently using self compiled Gentoo patched kernels, currently 6.6.30-gentoo. I could switch over to a distribution kernel which is a binary universal pre-configured kernel that should be sufficient for almost any hardware and usecase. Not sure if I made any serious mistakes in my custom kernel config regarding realtime processing and audio:
# zgrep -i IRQ /proc/config.gz |grep -v "^#"
CONFIG_IRQ_WORK=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_SOFTIRQ_ON_OWN_STACK=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
# zgrep -i PREEMPT /proc/config.gz |grep -v "^#"
CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE=7500
# zgrep -i HZ /proc/config.gz |grep -v "^#"
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_FULL=y
CONFIG_HZ_300=y
CONFIG_HZ=300
# zgrep -i TICK /proc/config.gz |grep -v "^#"
CONFIG_TICK_ONESHOT=y
CONFIG_SCHED_HRTICK=y
I cannot see bypass change so on my setup (Wayland, Sway) all the code is executed all the time. spectrum_chart:hide signal is not called, nor the src/effects_box.cpp:dispose function.
Indeed this should be a priority, with work on top is only useful when the window is visible. I'll have a deeper look.
Maybe you just did not test in a way that you can see gtk's "destructor" begin called. I tested it now on my computer (Arch Linux / KDE / Wayland) through a simple "util::warning("test);" put after
if (bypass || !fftw_ready) {
return;
}
After recompiling EasyEffects and restarting it in service mode easyeffects --gapplication-service
I opened another terminal and executed easyeffects as easyeffects
. No messages were printed after I closed the window.
I don't see how the patches can bring more overall CPU usage? And possible indirect effects over the realtime thread? There might be implications I am not aware of?
No. Your patches should be fine. I did not see anything in them that would make things slower. I may not have expressed myself well. What I was trying to say is that there are other operations EasyEffects has to do that are heavy and will fight for CPU time with the realtime thread. What could potentially make all the work you are doing invisible to the user. For example drawing the spectrum in the window is demanding on the CPU. This is a place where optimizations would definetely make noticeable difference for many people. The problem is that the heaviest operations are happening inside gtk and as a result are out of our reach.
Not sure if I made any serious mistakes in my custom kernel config regarding realtime processing and audio:
@Massimo-B so far the only difference I've noticed when compared to the standard Arch Linux kernel is that it also has CONFIG_NO_HZ=y
set. The rest seems to be the same.
Searching for that option I found CONFIG_IRQ_FORCED_THREADING, is that the same? That's enabled in my current kernel.
According to these sources it is necessary to build the kernel with this flag enabled but it alone is not enough
https://wiki.ubuntu.com/UbuntuStudio/rtirq https://wiki.linuxaudio.org/wiki/system_configuration
. It is still necessary to pass the threadirqs
boot option. Based on my previous experiences on something else not related to audio this seems to be true. Arch Linux's kernel has that flag enabled but it makes difference having the boot option specified or not.
A long time ago when the driver for my radeon was not very stable the desktop completely froze when the driver crashed and threadirqs
was not used. When it crashed with threaded irqs enabled I could still use the desktop and close the programs before forcing a reboot. Things were super slow after the crash but at least I could avoid loosing data when forcing the reboot.
Maybe you just did not test in a way that you can see gtk's "destructor" begin called. I tested it now on my computer (Arch Linux / KDE / Wayland) through a simple "util::warning("test);" put after
Ah, correct me if I'm wrong, but if the window exists but is not displayed, then bypass is not enabled. You must have EE running with no existing window ("in the taskbar") for the bypass to be active?
I thought sending the window to another workspace was enough. This is what I do usually.
I'll looking at Gtk APIs to see if we can detect that no part of the window is displayed.
GdkSurface
has enter-monitor
and leave-monitor
but I cannot find a way to grab a reference to our window surface.I thought sending the window to another workspace was enough. This is what I do usually.
This won't make the destructor to be called. The window has to be closed.
I'll looking at Gtk APIs to see if we can detect that no part of the window is displayed.
As far as I remember there is no API for that. And it makes sense that there isn't if we consider that the virtual desktop management and the knowledge about which windows are visible or not to the user is the domain of the window manager. And as fragmented as Linux is when it comes to desktops the chances that this can be implemented in a way that works for everybody are probably very small.
Else the frame tick stops when on another desktop so if no ticks are detected after a while we can assume we are hidden and bypass the spectrum. I've done a version of that but I still have a bug to fix before sending.
I wonder how well this works across different desktops. And between wayland and xorg. Considering that the main gtk doc about gtk_widget_add_tick_callback
isn't even clear about this particular behavior there is the danger this approach may have unexpected behavior in some desktops. It is better to separate this implementation from the others you are working on because this probably will require more time and people for testing.
Noted, I'll make sure to submit work split across pull requests.
@Massimo-B so far the only difference I've noticed when compared to the standard Arch Linux kernel is that it also has
CONFIG_NO_HZ=y
set. The rest seems to be the same.
From the menuconfig help:
CONFIG_NO_HZ: │ This is the old config entry that enables dynticks idle. │
│ We keep it around for a little while to enforce backward │
│ compatibility with older config files.
Instead of this I have set:
CONFIG_NO_HZ_FULL: │ Adaptively try to shutdown the tick whenever possible, even when │
│ the CPU is running tasks. Typically this requires running a single │
│ task on the CPU. Chances for running tickless are maximized when │
│ the task mostly runs in userspace and has few kernel activity. │
│ │
│ You need to fill up the nohz_full boot parameter with the │
│ desired range of dynticks CPUs to use it. This is implemented at │
│ the expense of some overhead in user <-> kernel transitions: │
│ syscalls, exceptions and interrupts. │
│ │
│ By default, without passing the nohz_full parameter, this behaves just │
│ like NO_HZ_IDLE. │
│ │
│ If you're a distro say Y.
What does threadirqs
boot option actually do? Any documentation about that?
I'm not into the linux audio topic yet, going to compare with https://wiki.linuxaudio.org/wiki/system_configuration#the_kernel
They also have some description of that threadirqs
kernel option.
Of course we also have the sys-kernel/liquorix-sources in some 3rd party repos but I don't think I need that for a simple desktop station doing voice calls.
Comparing, what they recommend and what matches my config (6.6.30-gentoo):
According to the quoted help above, the difference between CONFIG_NO_HZ_IDLE
and my CONFIG_NO_HZ_FULL
should be zero as I don't pass the nohz_full
parameter.
I understand now, that CONFIG_PREEMPT_RT=y
can be achieved on generic kernels by that threadirqs
boot option as well. I'm still not sure what the benefit and drawbacks of threaded irqs are, have not found any official documentation. Why isn't it enabled by default? Similar discussion and conclusions I found at
https://www.reddit.com/r/linux_gaming/comments/15buy6a/known_caveats_of_the_threadirqs_boot_parameter/
It feels like getting reduced latency by increasing CPU usage, which would not improve the situation if a exaggerated CPU load is the issue.
Mitigations: Ok I often thought about it... I'm not going to disable those as my machines have network and internet connections. Trying to estimate the actual risk is beyond my capabilities. If the performance impact is an issue I should better go and replace with a CPU of the next generation.
What does
threadirqs
boot option actually do? Any documentation about that?
Documentation/admin-guide/kernel-parameters.txt says this: "Force threading of all interrupt handlers except those marked explicitly IRQF_NO_THREAD".
It might help latency if you have crappy drivers that have slow interrupt handlers. It will also increase CPU load by increasing overhead work to be done per IRQ. At least that is my instinct reading the above description.
This sounds pretty similar to what PREEMPT_RT
attempts to do. The latter does go much, much further by turning most locking primitives into sleeping ones (it makes no sense turning IRQs into threaded ones if they still all use a raw spinlock), avoiding noirq (atomic) context whenever possible, and many other things I am not aware of.
* CONFIG_PREEMPT_RT=y # real-time kernel ⭕ Missing in my kernel
This requires external patches. It should be fully integrated into the kernel in a few releases, they have done massive work! You shouldn't need that for a desktop with voice calls setup.
I understand now, that
CONFIG_PREEMPT_RT=y
can be achieved on generic kernels by thatthreadirqs
boot option as well. I'm still not sure what the benefit and drawbacks of threaded irqs are, have not found any official documentation. Why isn't it enabled by default? Similar discussion and conclusions I found at https://www.reddit.com/r/linux_gaming/comments/15buy6a/known_caveats_of_the_threadirqs_boot_parameter/ It feels like getting reduced latency by increasing CPU usage, which would not improve the situation if a exaggerated CPU load is the issue.
No, PREEMPT_RT
is not achieved on upstream kernels with threadirqs
, this would ignore most of the work done by PREEMPT_RT
people. Part of the work has already been integrated upstream though, so you already benefit for it.
It feels like getting reduced latency by increasing CPU usage, which would not improve the situation if a exaggerated CPU load is the issue
@Massimo-B at least on my computer the overhead is unnoticeable. And the threadirqs option does not require kernel recompilation. Just set it in your bootloader configuration and see what happens. If things get worse just remove it from the boot loader configuration and reboot.
at least on my computer the overhead is unnoticeable. And the threadirqs option does not require kernel recompilation. Just set it in your bootloader configuration and see what happens. If things get worse just remove it from the boot loader configuration and reboot.
Recompiliation is not an issue, but reboot often is as I loose all my session.
There is no way to enable that feature at runtime? Is there any flag to check if it is active? I searched for something like
find /proc /sys -iname "*threadirq*"
...
Where do threaded irqs help? I guess irqs are only used for hardware interaction, such as sound, but also graphic or disk IO? I often have lots of btrfs workers in the background, not much CPU usage but still a lot of AVG load. I had the feeling that even that disc IO lead to errors on EasyEffects.
Beside that I reproduced these errors on both audio outputs I have, internal Intel and external amplifier/DAC via USB 2.0. No difference. Any recommendation?
Recompiliation is not an issue, but reboot often is as I loose all my session.
And how are dealing with pipewire updates? It is possible to manually restart it but sometimes desktop's volume managers get broken in the process and a reboot is more practical.
There is no way to enable that feature at runtime?
Not that I am aware of.
Where do threaded irqs help? I guess irqs are only used for hardware interaction, such as sound, but also graphic or disk IO?
They should help to reduce the kernel's latency https://lwn.net/Articles/302043/. What in turns might have some effect on the xrun errors you are seeing. Only tests will tell if there will be any difference.
Beside that I reproduced these errors on both audio outputs I have, internal Intel and external amplifier/DAC via USB 2.0. No difference. Any recommendation?
Besides trying the threadirq option the only other thing that comes to my mind would be to change the dynamic preempt mode https://www.phoronix.com/news/Linux-5.12-Dynamic-Preempt. Recent kernels allows this to be changed on the fly through the debugfs interface. But I would expect this option to have an even lower chance to impact those xruns.
As mentioned above, CONFIG_HAVE_PREEMPT_DYNAMIC is enabled here.
I booted with the new option and looked around in dmesg and syslog if there is any information about it being enabled. The only thing I find is the quoted CMDLINE:
# grep thread /var/log/everything/current
Jul 05 07:58:52 [kernel] Command line: BOOT_IMAGE=/vmlinuz-6.6.21-gentoo root=PARTUUID= ro rootflags=subvol=volumes/root rd.vconsole.font=ter-u12n rd.vconsole.keymap=de-latin1-nodeadkeys rd.locale.LANG=de_DE.UTF-8 rd.lvm=0 rd.md=0 rd.dm=0 rd.luks.uuid=67... rd.luks.allow-discards=67... root=LABEL=gentoo rootflags=subvol=volumes/root threadirqs
Jul 05 07:58:53 [kernel] Kernel command line: BOOT_IMAGE=/vmlinuz-6.6.21-gentoo root=PARTUUID= ro rootflags=subvol=volumes/root rd.vconsole.font=ter-u12n rd.vconsole.keymap=de-latin1-nodeadkeys rd.locale.LANG=de_DE.UTF-8 rd.lvm=0 rd.md=0 rd.dm=0 rd.luks.uuid=67... rd.luks.allow-discards=67... root=LABEL=gentoo rootflags=subvol=volumes/root threadirqs
Jul 05 07:58:53 [kernel] process: using mwait in idle threads
I found that with the threadirqs
option enabled I have 15 processes in pgrep -alf 'irq/'
, one for each hardware device.
Running very smooth now with threadirqs
no, having all 8 cores exagerated at 100% and load avg of about 25 and more.
In pw-top ERR column isn't zero, but not obviously increasing and no drops to hear. Currently highest ERR count is about the rnnoise module:
S ID QUANT RATE WAIT BUSY W/Q B/Q ERR FORMAT NAME
R 133 0 0 34,9us 705,9us 0,00 0,03 11468 + ee_sie_rnnoise
I'd be curious if you could check your old setup with the master branch now that #3231 and #3242 have been applied? Both optimise the spectrum realtime code so the ee_*_spectrum
ERR column values should increase much more slowly. I see a 5x perf improvement on average (but in absolute values on my setup we only go from 183µs to 34µs), and I expect much better worst case performance (meaning less xruns).
I would but even with the old setup it was not easy to reproduce as it happend after several days of uptime, sometimes got better by restarting Pipewire and EasyEffects which itself not always worked, so I usually got it solved by a reboot... I need to watch the issue with the new threadirqs
.
On a different machine I have the drops and errors eventhough I'm running threadirqs. There I'm using a Bluetooth headset. In pw-top I see that bluetooth input is using a very low quantum:
R 134 256 48000 896,3us 1,3us 0,17 0,00 196 S16LE 1 16000 bluez_input.08_C8_C2_1E_AA_4C.0
Other hardware has 1024, the browser process has 512. Should I increase the quantum? Ok, I tried. Increasing up to 2048 makes the ERR column not increasing anymore but still has drops in the playback.
Other hardware has 1024, the browser process has 512. Should I increase the quantum? Ok, I tried. Increasing up to 2048 makes the ERR column not increasing anymore but still has drops in the playback.
If increasing the quantum helped maybe adjusting some settings like ALSA headroom
in PipeWire/WirePlumber's configuration files will "fix" the issue.
How? Any why would that help?
In https://pipewire.pages.freedesktop.org/wireplumber/daemon/configuration/alsa.html I found some api.alsa.headroom
which is 0 by default? Is that the setting you mean? It sounds like a workaround for bad devices, not sure if this is the case here.
How? Any why would that help?
In https://pipewire.pages.freedesktop.org/wireplumber/daemon/configuration/alsa.html I found some api.alsa.headroom
which is 0 by default? Is that the setting you mean? It sounds like a workaround for bad devices, not sure if this is the case here.
How? Any why would that help? In https://pipewire.pages.freedesktop.org/wireplumber/daemon/configuration/alsa.html I found some
api.alsa.headroom
which is 0 by default? Is that the setting you mean? It sounds like a workaround for bad devices, not sure if this is the case here.
In the past I saw users in similar situations seeing improvements after tweaking the alsa headroom. I am not sure it this was done in PipeWire's or WirePlumber's configuration files. When this parameter was exposed some years ago it was in PipeWire's files. Maybe it is only in WirePlumber now.
As explained in the link this parameter impacts the timing between hardware and software when it comes to audio buffer management. When they say "bad devices" have in mind that this does not necessarily means a defective hardware. In most cases it will be just the driver not behaving as PipeWire is expecting it to. The extra delay may be what your system needs to handle audio under pressure.
There is not need to reboot and loose your session to test this. Manually restarting PipeWire and WirePlumber should be enough.
EasyEffects Version
7.1.3
What package are you using?
Gentoo
Distribution
Gentoo 23.0
Describe the bug
On higher system loads I have drops and errors in pw-top when running easyeffects. This also happens when CPU is not fully used (about 50%), but with Chrome and a MS Teams session with video. Load AVG is about 20 with some btrfs workers in the back. The issue is usually solved when killing easyeffects. I also disabled the spectrum now as it also did a lot of load.
Expected Behavior
No response
Debug Log
Additional Information