google / oboe

Oboe is a C++ library that makes it easy to build high-performance audio apps on Android.
Apache License 2.0
3.67k stars 559 forks source link

VoIP use case and Low Latency #2075

Open JimG777 opened 1 month ago

JimG777 commented 1 month ago

Android version(s): 6.0.1 - 14.0 Android device(s): Several Oboe version: 1.9.0

Short description I have recently used Oboe to implement an "Audio Device Module" for the WebRTC media communications stack. Generally, it works well. The sound quality and device support is really stable. The primary issue I have is that I am not getting low latency streams as much as I would have expected. I have looked at the API, FAQ, documentation, and several issues here but still can't solve it.

With the VoiceCommunication input preset and usage, 20 out of 23 devices can't obtain a low latency input stream. And for the output, 8 devices can get a Legacy low latency stream, but none of them are MMAP. At least half of my devices are capable of MMAP.

The main question here is whether or not it is possible to get more devices using low latency streams when using the VoiceCommunication setting?

Steps to reproduce I show my stream creation code here, but all of this can be reproduced with the OboeTester app.

For the format, sample rate, and channel count, I tell Oboe the format that WebRTC uses (I16, 48000Hz, 1 channel) and then ask Oboe to perform any of the conversions it needs. I understand that this can increase latency. But the conversions, if needed, have to happen, so I assume Oboe is efficient at doing so:

builder.setDirection(direction);

builder.setFormat(oboe::AudioFormat::I16);
builder.setSampleRate(48000);
builder.setChannelCount(1);

builder.setFormatConversionAllowed(true);
builder.setSampleRateConversionQuality(oboe::SampleRateConversionQuality::Medium);
builder.setChannelConversionAllowed(true);

I also setFramesPerDataCallback() to 480 (10ms), which is also what WebRTC wants for data callbacks. Again, I know that this can increase latency but something needs to handle the buffering, and I assume Oboe would be better at it than I would be.

builder.setFramesPerDataCallback(480);

The sharing mode and performance mode are always set to Exclusive and LowLatency respectively, in the hopes of obtaining low latency streams:

builder.setSharingMode(oboe::SharingMode::Exclusive);
builder.setPerformanceMode(oboe::PerformanceMode::LowLatency);

And I am using the data callback and using shared pointers with them as advised for stability and low latency:

builder.setDataCallback(/* using shared_ptr here */);
builder.setErrorCallback(/* using shared_ptr here */);

Now it seems we get to the problematic settings:

// audio_session_id_ comes from Kotlin: AudioManager.generateAudioSessionId()
builder.setSessionId((oboe::SessionId) audio_session_id_);

if (direction == oboe::Direction::Input) {
  builder.setInputPreset(oboe::InputPreset::VoiceCommunication);
} else {
  builder.setUsage(oboe::Usage::VoiceCommunication);
  builder.setContentType(oboe::ContentType::Speech);
}

oboe::Result result = builder.openStream(/* shared_ptr to either input or output stream */);

Setting the sessionId helps for loopback testing, but just like in https://github.com/google/oboe/issues/951, if using the AudioEffect for AEC and NS makes low latency broadly unattainable, I think we are okay using WebRTC's algorithms for them. Therefore, I have removed the sessionId setting and found that now 3 out of 23 devices I tested with can obtain a low latency input. In one case, the S22 (Exynos) could use MMAP, but without setting the SessionId, the input volume is really low/quiet.

I tried some other things:

  1. Use VoiceRecognition for the input
    • This allows us to achieve low latency for most of the devices, however, there seems to be several quirks with volume, volume control, etc. I can elaborate on this later after further testing.
  2. Using other usage settings for the output, for example Media
    • This usually gets the lowest latency, but then the volume controls don't show up for calling (even with the AudioManager mode set to IN_COMMUNICATION), which is not good for us
  3. Using the LatencyTuner for the output stream
    • This migh help a tiny bit for us, but I am not sure if using setFramesPerDataCallback limits what it can do

I've also run into this issue (https://github.com/google/oboe/issues/1291). I suppose when setting the input preset to VoiceCommunication (usually) enables the AEC, so the RT Latency tests don't usually work in OboeTester.

philburk commented 1 month ago

@JimG777 - Yes, this is a problem. High latency makes conversation more awkward. The users will not report "high latency". They just think the other person is being rude and talking over them. So lowering latency for VoIP is a high priority.

With OboeTester, I can see that the latency increases from 20 to 170 msec when I use VoiceCommunication mode and enable an Effects Session for Input. I was able to get a LowLatency output path on my device. If you can't get LowLatency then the increase will be much higher.

We have been actively working on this problem for awhile. I will update this Issue when I have progress to report.

there seems to be several quirks with volume, volume control, etc.

See #1886