mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.13k stars 1.1k forks source link

Mumble latency reduction #3502

Open trudnorx opened 5 years ago

trudnorx commented 5 years ago

The Opus codec has 3 modes which are described roughly like this by the Opus documentation: optimize for "low bitrate speech, high delay", "high bitrate speech / music, medium delay", "low delay". The first uses SILK and the latter two use CELT.

From a quick review of the code it seems that Mumble always uses the "high delay" mode, never uses the "low delay", and does not allow using the "medium delay" except when the user manually changes "codec/opus/encoder/music" (bUseOpusMusicEncoding) which cannot be accessed from the UI. This seems pointless because from a few quick tests a) I did not notice any difference between high and medium delay at extremely low bitrates, b) any diffference that exists at all, even unnoticeable ones, would probably disappear with >32kb/s bitrate. I found some docs online ( https://wiki.hydrogenaud.io/index.php?title=Opus ) which recommend always using CELT with >32kb/s.

Therefore if other people can confirm a) I would recommend enabling OPUS_APPLICATION_AUDIO ("medium delay") when initializing the encoder by default in all cases, otherwise it should be enabled by default when bitrate is higher than 32kb/s.

Regarding the low delay mode, this would save 7.5ms of latency, if it is used. It is simple to enable it as far as I know: Mumble simply needs to use OPUS_APPLICATION_RESTRICTED_LOWDELAY when it initializes the encoder.

Another 7.5ms could be saved if the user is allowed to set frame sizes as low as 2.5ms (which is only possible with certain of the modes described above). So a total of 15ms savings. Or 30ms for a round-trip if two people in a conversation both have it enabled.

While the "low delay" and "2.5ms framesize" modes did seem to require higher bitrates to preserve quality, I think changes to allow these modes to be used should definitely be implemented, and indeed "low delay" should be made default for certain bitrates, because in some quick tests, I was able to achieve good quality with 64kbit/s bitrate, low delay, 2.5ms framesize. If higher bitrate was needed I propose that Mumble allows selecting up to 510kbit/s when you enable "Advanced" checkbox in the options (the user clearly wanted to select Advanced mode, so must give him or her full freedom, IMO.)

So think about it: 15ms of delay that are pointless in many cases for many users given the high bitrates that they currently use (and even higher bitrates that they could potentially use if unlocked -- which is supported by many people's Internet connections). Mumble's central feature is "low latency communications" -- with these easy fixes, it will certainly improve itself even more in this regard.

Ytmndissue commented 5 years ago

This is actually really interesting, at the minimum, it would be nice to set this Opus mode in the config file so that we could explicitly opt into it.

Same with the ultra low frame sizes, if I recall, I think Mumble only lets us adjust it down to 10ms. Again, maybe let us set it in the config file if they do not want to expose it in the UI.

I don't use Windows 10, but your findings on WASAPI are interesting as well.

trudnorx commented 5 years ago

Thanks. Using Opus custom codec feature, by the way, would allow <2.5ms frame sizes. But I'm not going to focus on that for now.

So I'm already looking into this a little, and I think I might do a preliminary implementation of this, in which the low delay mode will simply be enabled automatically, when selecting a certain bitrate. This would happen at a threshold where there is no perceptible quality loss thus assuring a fluid user experience.

So I would like to have people do some testing to figure out at which bitrate numbers does that happen exactly.

People can test this with ffmpeg by taking a 48KHz lossless mono .wav file containing a high-quality speech recording and doing this:

ffmpeg -i input.wav -c:a libopus -b:a 8k -application voip test.opus

And then comparing the result to:

ffmpeg -i input.wav -c:a libopus -b:a 8k -application audio test2.opus

Do you notice any perceptible difference in the sound quality when opening test2.opus file as compared to test.opus? If you do notice a difference, try increasing the bitrate (e.g. change "8k" to "12k", for both commands), and running both commands again. Find the exact bitrate at which you notice no difference, and write here about it.

When done with this, compare between the following commands:

Command 1: ffmpeg -i input.wav -c:a libopus -b:a 8k -application audio test.opus Command 2: ffmpeg -i input.wav -c:a libopus -b:a 8k -application lowdelay test2.opus

Once again, do repeated testing, find the bitrate at which there is no difference between both and write about it.

And when also done with that, then use the following single command (not two commands anymore), and compare test.opus with input.wav instead of test2.opus:

ffmpeg -i input.wav -c:a libopus -b:a 8k -application lowdelay -frame_size 2.5 test.opus

Do repeated testing and find the bitrate at which there is no difference in quality at all between test.opus and original input.wav file. This will help determine what kind of bitrates are needed for low delay, 2.5ms framesize.

Ytmndissue commented 5 years ago

For me, the difference becomes minor at around 24k. The "audio" version has slightly more muted sound to it (like some sort of filter) towards the middle of the sample of mine, as well as a slight echo sound at the beginning. However, this is just on the threshold of being minimized at a 24k size. At this point, the VOIP mode seems better.

It isn't until 56k until I hear no real perceptible difference at all, so 24k works, but 56k seems to be the main magic number for me.

For the third one (the 2.5 frame size version,) it is very similar to the lowdelay option already. Adjusting the frame size doesn't seem to change much at all in terms of quality. 24k on this sounds like the 24k option for the second command.

Like the two initial tests, using 56k on the 2.5 size option yields a sample which is hard to differentiate from the original .wav, or the two other opus files. So again, 56k seems to be that "magic number."

These values might depend (of course) on the frequency of the speaker, I had a female voice sample recorded on a Sennheiser headset at 48Khz at the standard 16 bit depth.

trudnorx commented 5 years ago

Your post is a little confused. You're saying there's a minor difference between "voip" and "audio" at 24k, and "no perceptible difference at all" between "audio" and "lowdelay" at 56k? In this case at which point is there no difference at all between "voip" and "audio"? At which point is there no difference at all between "lowdelay" and original sample? I'm speaking here of the standard framesize, not 2.5ms.

However, I gather that when using 2.5ms frame, 56k is perfect quality for you with no differences, right? Did you also have "lowdelay" on while using the 2.5ms?

Ytmndissue commented 5 years ago

I'll be a bit more explicit for you:

Use 56k.

The difference is minor at 24k between the first two modes, VOIP and AUDIO as passed to FFMPEG.

The difference is gone at 56k.

The third command (I'm assuming) is supposed to represent how the 2.5ms frames sound? It is identical to the second command on all bit rates I hear, so 56k is again the bitrate to shoot for.

trudnorx commented 5 years ago

But what I'm trying to do is a three-way switch, i.e.:

if (iAudioQuality >= 64000) { // >64kb/s bitrate
    opusState = oCodec->opus_encoder_create(SAMPLE_RATE, 1, OPUS_APPLICATION_RESTRICTED_LOWDELAY, NULL);
    qWarning("AudioInput: Opus encoder set for low delay");
} else if (iAudioQuality >= 32000){ // >32kb/s bitrate
    opusState = oCodec->opus_encoder_create(SAMPLE_RATE, 1, OPUS_APPLICATION_AUDIO, NULL);
    qWarning("AudioInput: Opus encoder set for high quality speech");
} else {
    opusState = oCodec->opus_encoder_create(SAMPLE_RATE, 1, OPUS_APPLICATION_VOIP, NULL);
    qWarning("AudioInput: Opus encoder set for low quality speech");
}

In other words: by default it uses VOIP, if threshold A is hit then it switches to Audio, if threshold B is hit then it switches to Low Delay. So I need two thresholds -- not just one. Perhaps you're saying that 56k should be the threshold to switch directly from VOIP to Low Delay? But in this case we should not have VOIP at all, just always use Audio for the low bitrates, since in my testing VOIP and Audio performed just the same for me, even at low bitrate. Is this what you experienced?

Anyway, I hope we also get input from a few more people so we can determine a worst-case threshold that will allow a seamless experience for everyone.

Ytmndissue commented 5 years ago

Yeah, that's pretty much what I am saying - there is no difference between audio and lowdelay mode for me, my experience is the same with both - so you might only need one threshold.

There is a small difference between voip and audio mode in the lower bitrates, but that levels off at around 56k.

Ytmndissue commented 5 years ago

To be honest, I wouldn't even automatically consider the bitrate, the quality differences really are minor to where I don't even think switching is worth it. Just initialize it with OPUS_APPLICATION_RESTRICTED_LOWDELAY and call it a day.

trudnorx commented 5 years ago

I'm now working on adding options for low frame sizes.

So, actually, there's no way to configure the frame size at all currently. The "audio per packet" option in the Mumble settings does not actually configure the frame size itself at all (but then why does it only allow 10ms, 20ms, 40ms, 60ms, e.g. skipping e.g. 30ms which Opus does not allow as a frame size by default?). The frame size is always set to 10ms, no matter what.

How should this be resolved? Do we need two sliders or should the existing slider be changed to configure both things?

Ytmndissue commented 5 years ago

Weird, I'd say two sliders as that is less ambiguous.

trudnorx commented 5 years ago

I uploaded some preliminary code that attempts to implement this all to https://github.com/trudnorx/mumble/tree/branch-lowlatency if people want to mess with it

Might be broken, haven't really had a chance to test, would be nice if someone would take a look at it / test it, figure out if it's broken, figure out why, fix it up

Ytmndissue commented 5 years ago

I don't have a compile environment setup, but I'll test it when I have a future opportunity.

trudnorx commented 5 years ago

So apparently jitter buffer code needs to be changed and maybe other stuff too.

trudnorx commented 5 years ago

Good news, low delay Opus mode is working with aforementioned code but with buggy audio (you can make out words and stuff though), which proves the concept works, just need to fix it now Low framesize works too, just also bugged audio

Ytmndissue commented 5 years ago

Any luck messing around with this?

trudnorx commented 5 years ago

No, I decided to take an indefinite break from this work when it was getting too complex. I was hoping that someone with more experience with the Mumble codebase could help. The problem is, while it's successfully making use of the low delay / low frame size modes (apart from one bug where you can't select certain combinations), the rest of the code isn't prepared to handle it, so it results in buggy audio. I think it has something to do with buffering, because it gets progressively worse the longer you've been running the program. Also, comments in the code indicate that the jitter buffer isn't prepared to handle such settings.

trudnorx commented 4 years ago

This issue is 2/3 implemented now. The only remaining issue is allowing <10ms frame size down to 2.5ms

trudnorx commented 4 years ago

@davidebeatrici @Kissaki @mkrautz do you know anything about how Mumble audio code handles frame size? Would it readily support <10ms frame sizes?

Kissaki commented 4 years ago

I am not familiar with that

trudnorx commented 4 years ago

@Krzmbrzl Do you have any knowledge about how to implement allowing <10ms Opus frame size down to 2.5ms?

Krzmbrzl commented 4 years ago

Nah sorry. I don't know much about the audio code and given the nightmarish state of that part of the code I don't really want to start digging into it :see_no_evil:

I think @davidebeatrici would probably be your best chance at this...

trudnorx commented 4 years ago

Ways to save latency

  1. Allow choosing Opus frame size <10ms, for example 2.5ms. Currently, there's no option at all for this. This will decrease up to 15ms from roundtrip.
  2. Allow using jitter buffer <10ms.
  3. Allow using audio per packet <10ms.
  4. Output delay <10ms #3503
  5. "Zero latency preprocessor" #3508 (maybe?)

Things which have been implemented already related to this issue:

  1. Opus CELT and LOWDELAY intelligent mode switching. Up to 15ms saved from roundtrip!
trudnorx commented 4 years ago

@Krzmbrzl I think there's no hope, I've been trying to ask every dev on the project for over a year and they're either inactive, or don't know about the sound code... I want to focus on getting the lower Opus frame size to work it would result in some pretty considerable latency improvements...

So far how I understand it is, Mumble initially creates an Opus encoder with opus_encoder_create(), one of the parameters is the frame size, and then, every time it feeds the encoder a bunch of data with opus_encode(), it specifies the frame size there too... but changing it in the parameter of the function wouldn't be good enough because the rest of the code, like the arrays that contain a certain amount of sound data, etc. need to be compatible with that frame size.

I tried to change it myself, but had problems like buggy audio, likely because I didn't change the rest of the code right, it seemed complicated, tangled... or maybe I tested it wrong. I think the comments said something about the way the jitter buffer is coded, it's not able to handle stuff like a lower frame size. So maybe that would have to be changed.

felix91gr commented 4 years ago

I've been trying to ask every dev on the project for over a year and they're either inactive, or don't know about the sound code... I want to focus on getting the lower Opus frame size to work it would result in some pretty considerable latency improvements...

This is so sad :cry: Mumble's latency is pretty great. Having it be lower still would probably open it to things not seen before, like remote music/chorus group practice.

streaps commented 4 years ago

The Mumble desktop client would need some other changes to be suitable for music/chorus group practice. There is some hard coded audio processing that you don't want for music. There is also already some other software for low latency audio streaming, e.g.

https://github.com/elieserdejesus/JamTaba https://github.com/corrados/jamulus

iainhallam commented 3 years ago

The Mumble desktop client would need some other changes to be suitable for music/chorus group practice. There is some hard coded audio processing that you don't want for music. There is also already some other software for low latency audio streaming...

Neither of those has an Android or iPhone app; in fact there's an interesting discussion on Sourceforge about using Mumble to connect to Jamulus sessions. Is there any way to turn off the processing relatively easily to reduce latency further?

davidebeatrici commented 3 years ago

3323

Chris2000SP commented 8 months ago

If 2.5ms would be possible, then calculating 2.5ms audio processing + 2x 2ms fiber latency to and from server + 2.5 for audio processing sums to 9ms. That could be a killer feature for Musicians. I would love that.

felix91gr commented 8 months ago

You might want even less than that, depending on the use case. Highest latency useable seems to be ~42ms, but depending on the context, you might need to constrain it to less than ~1.4ms.

I wonder at what scale would this be achievable. 1ms starts getting to the limit of how fast it is physically possible to send a signal through the internet.

Source: https://web.archive.org/web/20090219061343/www.lsbaudio.com/publications/AES_Latency.pdf

davidebeatrici commented 8 months ago

Keep in mind that there's also the audio backend's latency.

Chris2000SP commented 8 months ago

If you host the mumble server at home, you could get it lower if the people joining the server all have FttH. EDIT: I mean if the Drummer would be the Server host, that would be the best.