mozilla / cubeb

Cross platform audio library
ISC License
442 stars 126 forks source link

WASAPI backend doesn't obey latency parameter #324

Closed ligfx closed 5 years ago

ligfx commented 7 years ago

I believe this a limitation of WASAPI shared mode: no matter the requested buffer size, callback events still come in at the default device period.

For example, on my computer (Windows 10, Cirrus Logic CS4208 speakers set to 24-bit 48000Hz), the default device period is 10 milliseconds / 480 frames. In test_audio, which requests a latency of 4096 frames, the first callback gets to fill the entire buffer (4096 frames), but each additional callback gets a number of frames equivalent to the rate-adjusted device period.

One strategy that seems to work is to ignore WASAPI refill events (e.g. don't call IAudioRenderClient::GetBuffer) until the desired number of frames is available to be written to. Of course, since the buffer size is equal to the desired number of frames, this causes audio glitches since the audio engine runs out of data. The default buffer size seems to be twice the period plus some margin, so I tried a buffer that's a little more than twice the requested latency, and that seemed to fix the glitches.

sidewayss commented 6 years ago

I made another change suggested by @padenot, and the results are better now. I am not inserting any silence at the beginning of the input buffer. Instead I am discarding the first 9 frames in the refill_callback, which gives the input stream time to stabilize. I got the occasional error discarding the first 4, then 5 frames, so I set it to 9 and have not seen any errors since. I'm still running in the debugger, and we can bring that number down. At 10ms period that's 100ms wait for the duplex stream to effectively start (the first frame of output buffer is always goofy).

But the trade off for a small delay in starting the stream is a clear 20ms reduction in latency at the 10ms period. I am seeing a 40ms latency 75% of the time, and a 50ms latency the other 25%. Never as high as 60ms in >30 test runs.

At the min period, 2.67ms, I fail 75% of the time, but the times I don't fail I get latency as low as 12.6ms. I am now inserting 1 frame of silence again and no more failures, with latency of mostly 21ms, as low as 15ms. I'm still running in the debugger, so these numbers are not precise, and we still might be able to avoid the frame of silence, even in low latency.

But it's looking like the worst case is inserting 1 frame of silence only when low latency is requested, where it adds less than 3ms of latency. I'd be OK with that. I will get the logging working soon, and these results will get more precise.

sidewayss commented 6 years ago

I have a release .exe and I am using ALOGV() to log my values asynchronously. They aren't much different from my debug values, the ones I have been reporting. At low latency I'm seeing an average of 18.2ms round-trip latency. I'll run it various times over the weekend and accumulate results in a table. I have minimized the logging to one number per test, elapsed time, aka latency.

The main thing is that now others can build this version of test_tone.exe and generate their own logs on different hardware. My code is here, only two files are changed: cubeb_wasapi.cpp and test_tone.cpp. It creates a log file named w_latency_log.txt in the same folder as test_tone.exe. The exe overwrites the log file each time it runs. The exe runs for just over 60 seconds to accumulate 60 latency values. If you like I can post an exe here or elsewhere.

This code always selects low latency if it is available. It still chooses the period based on the minimum period of the default output device. I have been using the same built-in hardware for input and output, as it is the only hardware that is currently supported by Windows for low latency.

Is there a way to stop the logging from including .cpp file line numbers? It makes it harder to parse the results for analysis.

sidewayss commented 6 years ago

Do you all build this for Windows only as x86 32bit, or do you build 64bit versions as well? I'm running a 64bit OS, but I've been building everything as x86 so far. I'm wondering if it's worth building and testing a 64bit exe.

sidewayss commented 6 years ago

test_sanity.cpp:

#define STREAM_RATE 44100
#define STREAM_LATENCY 100 * STREAM_RATE / 1000

STREAM_LATENCY = 4410. That is a 100ms period. On Windows, AFAIK, that isn't available, and it will use 10ms instead. Is 100ms available on other platforms, and this is intentional, or should it be defining STREAM_LATENCY as 10 * STREAM_RATE / 1000?

padenot commented 6 years ago

This is intended. For example, we use a latency of 100ms in Firefox, regardless of the platform, for regular media playback (non-real-time). This github issue is bug is about Windows not obeying this parameter.

Also, this is the total latency, not the callback period.

sidewayss commented 6 years ago

It might not be intended as the callback period, and it is reset to 10ms by Windows, but it is the callback period that is requested in IAudioClient.Initialize() in most of test_sanity.cpp. That's why I inquired. For example:

  r = cubeb_stream_init(ctx, &stream, "test", NULL, NULL, NULL, &params, STREAM_LATENCY,
                        test_data_callback, test_state_callback, &dummy);
sidewayss commented 6 years ago

Maybe I should rephrase: The way I am making Windows obey this parameter is to treat it as the callback period, the value returned by wasapi_get_min_latency() and the value set in wasapi_stream_init() by IAudioClient3::InitializeSharedAudioStream(). That is the way it was setup when I arrived. I don't understand what you mean by "total latency". I have no way to set total latency, I can only set the callback period.

sidewayss commented 6 years ago

I have been testing with the few devices that I have, collecting round-trip latency data. These tests are microphone facing speaker My webcam is delivering 80ms latency, but with too many inconsistent data points (it's not designed to be a close-range microphone?). My bluetooth to USB headset mic is getting 120ms latency, very consistent results. These devices clearly have more things increasing the latency than just a 10ms period. I have not had any success testing my iOne USB interface with loopback cables. So I tested that with a mic placed in front of speaker (SM57 is designed to be a close-up mic). It's averaging 70ms, which is a bit above the built-in devices when using 10ms. Some of the difference might be attributable to the speaker to microphone audio transfer versus a short loopback cable.

I'm collecting a bunch of data. I've designed my test_.exe to execute 60 round-trip measurements in 60 seconds. It logs the results asynchronously. I have a spreadsheet that accumulates the numbers and calculates a few things. Hopefully we can enlarge the range of devices and keep collecting data in an organized way. The log file results are simple to copy/paste into a spreadsheet or a relational table. Each test run is 1 column with 60 rows of data. But it could be 1 row with 60 columns if that's preferred.

I would appreciate it if anyone can provide me with a better click/pop/white/pink-noise formula. My dial-tone is flamming at this short duration. I don't think it's affecting the results in any noticeable way, but it's not generating a tone with consistent level, and it has a flam double-attack. I'd like to clean that up. I'm noticing it now that I'm listening to it over the speakers.

There are other interesting trends in the data, the most interesting being better results inside the debugger than running a release exe: lower latency and less variation in the latency values across tests. It's filling the buffer more consistently on time. Does anyone know why that might be? Is there a way we can make the release exe run faster? It should run faster and more consistently than the debugger, shouldn't it?