Built-in music player audio stuttering on Linux

future-figs commented 4 years ago

Description

Audio intermittently "pops" during playback. I'm having audio issues in all cores as well, but since the issue seems to affect all of RetroArch, it's probably easiest to troubleshoot this way.

I've had this issue on multiple machines and multiple distros, but it's never been a problem on Windows, even on the same hardware.

I've been testing with a simple 5-minute sinewave file I made in audacity (untitled.zip). Other files work as well, though. My config file is reset to default.

Expected/Actual Behavior

The exact behavior is audio latency-dependent. In my own testing, on my laptop, I observed 3 different patterns, sharply divided:

24 ms and below: constant erratic pattern of pops
24-48 ms: one pop every 5 seconds exactly (sometimes one will be missing, but it's rare)
Above 48 ms: 1-2 pops ~5 seconds after playback starts, then smooth. However, the startup pops persist even with the max audio latency setting (512 ms!).

The expected behavior, of course, is a pure sine wave. I can play the file back without these issues using VLC without issue. fplay (ffmpeg package) works as well, so the issue definitely isn't the ffmpeg core.

Relevant Settings/Troubleshooting

Video driver: no effect when set to null. All testing done with default value (x)
Audio driver: no effect (some drivers do not work, most testing done with Pulse driver)
Synchronize audio: leave on (default), more frequent stutters when off
Vsync: behavior improves when turned off (default on); pops are gone above 24 ms, 5-second timed pops from 24 to 16 ms, and random pops starting at 8 ms
Hard GPU Sync: causes a few random pops before falling into the same behavior pattern as normal. Sometimes causes 2-3 pops instead of just one.
More? I can test other settings if requested.

This has been an issue on Linux for as long as I can remember, so I can't really bisect this one. It's hard to say if it's gotten better or worse, since it's intermittent and depends on other settings.

Possible Causes

My first thought is disk access: let's say RA fetches 5 seconds at a time, but only fetches once it's completely out of samples. In that case, if the audio buffer is shorter than the filesystem access time, and we get an underflow every 5 seconds. But that doesn't explain the relation to Vsync settings. I'm also not sure how this translates to rom files.

The resampler could be to blame, but then this should affect all platforms equally.

Also possible is some type of scheduling/priority/preemption issue. Like not being able to refill the audio buffer during Vsync blocking or something. But the math doesn't add up there either; a buffer longer than 16.7 ms should be safe no matter what. Setting the video driver to null should have fixed this as well.

Maybe there's some problematic code that was reused between the Pulse/ALSA drivers?

Normal Cores

I'm having similar problems in normal emulator cores as well, which is preventing me from using RetroArch. I suspect it's the same issue, but I think troubleshooting it via the music player can cut out a lot of variables. Normal emulator cores don't have this same degree of reproducibility (the stutters are random, not timed), nor do the stutters ever fully go away at any latency setting. The music player might not be such a high priority in itself, but a fix for this could be really valuable.

I have a separate, long-dead issue open on this already (#8235).

Environment

RetroArch: 1.8.4 9e25601
OS: Kubuntu 19.10

melroy89 commented 3 years ago

Yup, I also suffer from this issue on Linux mint. It's crazy.

I tried everything like you said. Changing to alsa. Playing with the sound latency. Different cores. It's just a big mess at Libretro with respect to sound.

Edit: could those findings maybe be relevant for this community? See: https://www.linuxquestions.org/questions/slackware-arm-108/rpi3-sarpi-retroarch-stuttering-sound-%2Afinally%2A-fixed-via-pulse-audio-tsched-setting-4175671360/

future-figs commented 3 years ago

Hey, long time no thread. I actually looked into this a while ago but never posted my findings.

I set up Retroarch to run at the native core refresh rate and disabled Pulseaudio completely. I accomplished the first part by running in KMS (kernel mode setting), i.e. by running retroarch in a separate TTY without X running. This lets me take advantage of my adaptive-sync monitor—more importantly, it means that the content isn't being sped up to match a different refresh rate. I accomplished the second using pasuspender and putting the desired ALSA device in my config file.

Then, I downloaded the Retroarch source code and added a bunch of diagnostic logging. Basically, it prints out the current time in microseconds, and the exact part of the code that's being executed. I then used Octave (Matlab clone) to show the scheduling over time to try to identify the problem. Here's an example (bsnes running Super Mario World, 64ms latency):

The blue line is the current state of the ALSA audio buffer. It's constantly decreasing as ALSA plays & consumes audio frames, except when RA pushes a batch of samples. This behavior is determined by the core; with bsnes, the samples are pushed in 800-frame batches, so we get a sawtooth pattern at exactly 60Hz in theory. (In the zoomed-out view, you can't make it out.)

The red dashed line at ~3000 frames is an EPIPE. This is the audible "pop" that occurs when ALSA runs out of audio frames to send to the speaker. This is the only type of ALSA error I ever ran into, and always correlates with the audible stutter. This sample is long and only has one xrun due to the large audio buffer (64ms), but with a smaller buffer it's common to see a few back-to-back.

Here's a detail view of the beginning of the sample:

The sawtooth is now visible, along with the shaded rectangles. The red sections correspond to time spent in the video_driver_frame() function—this is the "video" task that pushes a frame to the screen. The grey sections correspond to time spent in the retroSleep() function—this is a software delay, which I assume is supposed to keep the frames coming at the right time. You could call it the "idle" task. So, the core is only actually running in the white, unshaded areas. The X-axis is set up to show an exact 60.10 Hz refresh to match the SNES's output framerate.

During frame 21, the audio task runs twice before the core finishes the frame and the video task takes over. This results in excessive latency but doesn't cause a stutter; note that the audio task never triggers during frame 22. Later on, this same thing happens in reverse. Since the buffer is big enough to handle this variability, this behavior is tolerable as long as there's no overarching, gradual depletion or saturation of the buffer. What we see here is just a race condition.

The problems start around frame 652:

Up until this point, the video task has been taking slightly longer and longer, but it's still manageable and synchronized with the desired refresh rate. However, the video task suddenly becomes extremely slow, even taking more than a frame at 654-655f. The video task only has to do a nearest-neighbor scale and flip the buffers, so I don't know what could be causing this behavior. With an adaptive-sync setup, the program will never pend on Vsync at these refresh rates. This incident will cause the video to stutter, but the audio is not affected since we had enough frames already in the buffer to tank a 1-frame delay. However, the core must now run faster than real-time to refill the buffer to its previous level.

This happens again and again on a regular cycle, at 775f, 950f, 1175f, 1420f, 1640f, 1880f, 2110f, etc. (approx. every 220 video frames, or 3.7 seconds). Not every instance of this video snag causes a drop in the audio buffer, but every drop in the audio buffer lines up with one of these video slowdowns. If this is related to the video driver (I'm using Vulkan on Nvidia proprietary drivers), maybe it could explain why this issue is prevalent on Linux.

Finally, here's a detail view of the xrun event:

One of the periodic video slowdowns loses 800 audio frames. Then, the video task triggers barely before the audio task, losing another ~800 audio frames and running the ALSA buffer dry. This is bad luck—the race condition easily could have not happened, or even gone the other way—but this was bound to happen eventually if you look at the gradual decline of audio samples over the whole dataset.

So, here are my observations:

The audio buffer depletes slowly over time, i.e. the core is running too slowly on average. Actually, the core runs too fast by default, but the periodic video video_driver_frame() sets it back even further.
The audio buffer is consumed in a "compound sawtooth" pattern, requiring a large buffer and excessive latency. The first sawtooth comes from the core (audio frames added in 800-frame batches instantaneously). The second, larger sawtooth is due to the race condition between the audio task and the video/idle task. As a result, even when the system utilization is low, we need a huge buffer to absorb the variability in latency. With better scheduling, buffers of ~1 frame should be possible, reducing audio lag. This could be resolved by synchronizing pushing the audio buffer with pushing each frame, but this needs to be implemented inside the core, not in RA. (Flushing single audio frames would also work and is ultra-low-latency, but there's a performance hit.*)
The ALSA driver is very good and doesn't seem to be causing any of these problems (same for ALSA itself). While Pulseaudio might introduce new issues, it's not the root cause.

Future work might involve running with a null graphics driver, but I've lost the testing setup I used to take this data. I can set it up again with up-to-date RA if there's interest in actually root-causing this issue, but otherwise I'm letting it go.

*I actually tried this, by modifying bsnes's source code. It does remove the tiny excursions but does not resolve the gradual depletion issue, so the audio stutter persists.

inactive123 commented 3 years ago

What do you mean by 'built in music player audio"? The built-in audio mixer in RetroArch that you can control from the menu? The one where you can load in WAV, Ogg, FLAC files?

If so, yes, that has always had some audio pops unfortunately (unless you load in a libretro core and then just mix it in with the libretro core's regular sound). I don't know how to fix it, I've asked several people for help but they never helped, so I dunno what to do there.

However, this shouldn't have any effect on regular Libretro core sound. I have never had audio popping issues on any Linux distro I've used before in the past. I've used Arch Linux, Ubuntu, etc. Of course, your mileage might vary based on your distro or audio driver/audio card, and particular latency configuration in RetroArch.

Anyway, if you have ways of mitigating these issues and improving things, then by all means - PRs are always welcome.

inactive123 commented 3 years ago

Can you run these tests with a core other than bsnes? The issue with bsnes is that it doesn't use the modern audio batch sample callback that we try to use on nearly every other libretro core for optimization reasons, instead, due to the way bsnes is designed, it has to push back a sample one at a time. Other SNES emulators like snes9x instead can push multiple samples per function call.

future-figs commented 3 years ago

Hi twinaphex, yes I was using the built-in audio mixer (with the cool visualizer). I was trying to do a more "controlled" test without the unknowns related to actual emulation--didn't realize it had known issues with stutter already.

Actually, the version of bsnes I was playing with pushes the samples to Libretro in batches (800-frame batches @ 48kHz iirc). This might have been added relatively recently.

I took a bad nvidia-drivers update last night, but I'll try to get my test setup back up and running. Are there any particular cores or config changes you'd recommend trying out?

future-figs commented 3 years ago

So I downloaded the current version of RA and re-added my debug logging code to retroarch.c and alsa.c. To my surprise, the problem is gone! Here's a plot of bsnes, approximately the same settings as before (lost my old config...)

The slow sawtooth pattern is still present due to the way bsnes buffers/flushes its batches of audio samples. However, the "nose-dives" are gone, and the buffer oscillates about 50% without getting close to an xrun—I think this is because the video_driver_frame() task (red shading) is behaving itself. All across the dataset, the time required for the frame task is consistent and low.

So, my current theory is that the video_driver_frame() function was intermittently becoming very slow. This would slow down the entire application, causing a frame stutter and an audio dropout. Of course, the audio pop is much more noticeable than a single-frame stutter, but this issue was never really confined to just the audio system.

As for another core, here's FCEUmm…now this is what I'm talking about!

This core could run with an audio buffer around 17ms (1 video frame) since the audio batch flush is synchronized with the video frame. That's the best we can achieve without the audio_driver_sample_batch() task interrupting the video_driver_frame() task.

However, while capturing some plots for this post, the problem came back. I logged out and back in and it's fixed again, so it might take some work to pin down how to reproduce it. @danger89, do you have an Nvidia card by any chance? Maybe it has to do with graphics drivers.

Check out my fork if you want to try it on your own system: [https://github.com/Mr-Figs/RetroArch]()

clort81 commented 3 years ago

You may discover that upscaling scaling the output has significant effects on audio playback problems.

With my relatively weak gpu (mali g52, panfrost), I'm getting a lot of audio problems when I upscale opengl windows beyond ~2x with anything but bilinear / nearest-neighbor.

Retroarch audio seems very sensitive to GPU rendering every frame 'in-time', even when cpu use is low. I haven't found settings to reduce this hard dependency.

clort81 commented 3 years ago

Wrote too soon. Have a config now that's stable soundwise on khadas vim3 (like odroid N2).

melroy89 commented 3 years ago

@Mr-Figs Actually no. I have AMD videocards only.

libretro / RetroArch