meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.17k stars 2.47k forks source link

Add optional RNNoise support to AudioBridge #3185

Closed lminiero closed 5 months ago

lminiero commented 1 year ago

This PR is an attempt to revive an effort initially contributed by @mirkobrankovic in #2260, by adding an optional and configurable support for RNNoise to the AudioBridge plugin: the idea is to basically add a mechanism to perform noise reduction in audio rooms with the help of the RNNoise library.

This patch is is quite different from Mirko's original contribution, though:

Apart from that, I added ways to selectively enable/disable the feature. First of all, you can configure a room to enable denoising by default, by setting the denoise property to true: in that case, any participant that joins the room resunts in a denoiser instance created for them, unless they explicitly provide a denoise: false property when joining. A participant that joins a room where denoising is not enabled by default, can instead create a denoiser by joining with a denoise: true property. Denoising can also be enabled/disabled dynamically: participants can do that for themselves via a configure request, while room owners can use the synchronous denoise_enable and denoise_disable requests instead, by specifying the room and the specific participant to impact.

Coming to the actual implementation, it only partly works, due to some constraints in the RNNoise library that I haven't overcome entirely. Specifically, the rnnoise_process_frame function expects a buffer of exactly 480 samples to denoise: it can't be more and it can't be less. Depending on the sampling rate in use in the room, a single audio packet of 20ms received via WebRTC can contain a different number of samples: when using 16000 as a sampling rate, for instance, it will be 320 samples; 480 for 24000; 960 for 48000. This means that, at the moment, denoising works fine when you use a sampling rate of 24000 or 48000 (since can do one or two rounds of denoising with samples that are multiples of 480), while you get audio artifacts when using lower sampling rates instead. At the moment, I'm also getting artifacts when using stereo rooms (spatial audio): I do know that Opus uses interleaving when stereo is used, but even taking that into account (and so performing multiple rounds of denoising on different subsets of 480 samples) the artifacts are still there.

For this reasons, I decided to submit this as a draft pull request: in fact, it kinda works already, but it also definitely needs improvements, so I'm hoping that some fresh pairs of eyes looking at the code and/or people testing and providing feedback may help address what's currently not 100% working instead.

Tagging as multistream since I implemented this in 1.x, but this will be trivial to port to 0.x as well should this be merged eventually.

lminiero commented 1 year ago

I spent some time investigating what was wrong, and managed to fix the broken audio and artifacts in all combinations of sampling rate, mono and stereo alike. It seems to be working fine to me, now (even disabling/enabling denoising on the fly, where you can actually hear it at work), so I marked the PR as ready for review. That said, I'll obviously wait for more feedback before (if) we merge this, also taking into account the jitter buffer PR would need to be merged first (which means I'll have to rebase this accordingly).

IlgamGabdullin commented 1 year ago

Hi! Is there a chance that this feature will be in VideoRoom Plugin?

lminiero commented 1 year ago

Hi! Is there a chance that this feature will be in VideoRoom Plugin?

No, because the VideoRoom plugin only relays packets, it doesn't decode them as the AudioBridge does.

Odinvt commented 1 year ago

Hi! Thanks for this PR!

We were using a frontend version of the same RNNoise with the default model via WASM and it doesn't affect the original voice quality much. (https://github.com/jitsi/rnnoise-wasm)

But, when testing this PR it results in a very "strangled" voice quality compared to the WASM one.

Would you say that this is related to the fact that it's used on the original lossless microphone samples before encoding frontend side, or rather something related to the usage of RNNoise in this PR ?

PS: I could provide 3 simultaneous audio recordings of the original, denoised-frontend, denoised-janus sound in a noisy environment if it helps

Thanks again !

lminiero commented 1 year ago

Would you say that this is related to the fact that it's used on the original lossless microphone samples before encoding frontend side, or rather something related to the usage of RNNoise in this PR ?

I think it depends on the sampling rate you're using in the AudioBridge. If you use, e.g., 16000, resampling will be performed, and I think it may be that the way we do it causes what you hear. If in a 48000 room the audio is good, and comparable in your tests, that's likely it.

Odinvt commented 1 year ago

Indeed i retested with 24000/48000 room sampling rate and it works perfectly fine. I'll be moving this to production tonight for some 500 daily users (some OPUS/48000 some PCMA/8000 (SIP & PSTN bridged from the SIP plugin to the AudioBridge plugin)) I'll let you guys know how it works out for quality & CPU usage. Thanks again great work 🙏

Odinvt commented 1 year ago

We've seen no significant CPU usage increase with this enabled in production (2 to 4 percentage points increase in average CPU usage across multiple 32 CPU VMs). Although, it does seem to add some "barely noticeable" cumulative delay in audio contributions with denoise enabled (just under the 200ms mark) for hour long conferences.

atoppi commented 1 year ago

@Odinvt are you sure the observed delay is due to the denoiser? We recently merged the jitter buffer in the master branch (now merged in this PR) and that might also explain a bit of latency. To double-check this, use the Admin API for an handle with the delay and read the values of buffer-in and queue-in.

Odinvt commented 1 year ago

@atoppi Indeed I was on v1.1.4 stable before. I only switched to v1.2.0 because this merge was rebased on top of speexdsp's jitter buffer. So i never got to actually test v1.2.0 separately. I will pay attention to the mentioned values for both denoise enabled and disabled, and get back to you asap. Thanks !

spscream commented 11 months ago

Hi, we are testing your PR. Denoise is working when initially joined with denoise: true. But changes on the fly with new configure request aren't working:

{"body":{"denoise":false,"request":"configure"},"handle_id":8708560238099099,"janus":"message","session_id":6601772272643216,"transaction":"QUrR+WT5jq4"}
{"body":{"denoise":true,"request":"configure"},"handle_id":8708560238099099,"janus":"message","session_id":6601772272643216,"transaction":"+6ZA40wJa3g"}

requests abobe doesn't change anything on the fly. Is anything wrong with them? I also tried with ice restart, sending new jsep and denoise flag, but it does no effect.

brave44 commented 11 months ago

@lminiero any plans to merge this?

lminiero commented 11 months ago

requests abobe doesn't change anything on the fly

@spscream Does it work when using the synchronous denoise_enable and denoise_disable requests instead?

@brave44 we do plan to merge this, but not sure yet, as we're still waiting for more feedback; besides, we may have to improve how it works when doing 8000/16000 sampling rates.

brave44 commented 11 months ago

@brave44 we do plan to merge this, but not sure yet, as we're still waiting for more feedback; besides, we may have to improve how it works when doing 8000/16000 sampling rates.

Thanks, yeah, for us it works fine with 4800 sampling rate.

atoppi commented 10 months ago

@Odinvt @spscream @brave44 the algorithm has been totally refactored, could you please test the last revision?

Odinvt commented 10 months ago

@Odinvt are you sure the observed delay is due to the denoiser? We recently merged the jitter buffer in the master branch (now merged in this PR) and that might also explain a bit of latency. To double-check this, use the Admin API for an handle with the delay and read the values of buffer-in and queue-in.

I apologize for the late reply. The audio delay was due to our "tree" scaling's forwards going through a VXLAN over VPN which encryption was not hardware accelerated. We've been running this branch with the jitter buffer rebase since September with no issues with 24000/48000 sampling rates in production after successful stress tests.

requests abobe doesn't change anything on the fly

@spscream Does it work when using the synchronous denoise_enable and denoise_disable requests instead?

@lminiero the synchronous requests work fine we've been using them in production also.

@Odinvt @spscream @brave44 the algorithm has been totally refactored, could you please test the last revision?

@atoppi Will do. Thanks !

spscream commented 10 months ago

@atoppi thanks, we will check it

spscream commented 10 months ago

We are having issues with ab under production load now. Cpu usage isn't on top(~30%) but we have a robo voice on every of ab participants, despite denoise enabled or not. When we disable use of ab in conference and switch to audio using videoroom everything looks fine.

lminiero commented 10 months ago

@spscream if you can do a git bisect, that would help narrowing down the commit that causes the problem for you. On a regular basis we rebase PRs on changes that occur in other branches, and we've made a few t the AudioBridge.

spscream commented 10 months ago

@lminiero we can try, but this is not going to be a very fast, we have troubles on morning peaks and we disable ab when we have troubles.

Our branch (https://github.com/spscream/janus-gateway/tree/feature/express-specific-fields-rnnoise2) is based on master with some our custom logic + I merged rnnoise branch to it.

I could try this on master branch this week and figure out if it happens on master.

lminiero commented 10 months ago

Can't help on forks, sorry. If you can find if any commit we made is causing the problem, please do let us know.

spscream commented 10 months ago

@lminiero sure, we will do our best. But fork isn't related to ab, only vr logic affected(some custom fields added to participants state).

spscream commented 10 months ago

@lminiero we checked on master + our commit today and we have no troubles on it. We will check the latest changes on current branch afternoon.

spscream commented 10 months ago

@lminiero moved to latest changes from this branch, no problems now for us. We will monitor it until tomorrow, looks like everything is fine.

spscream commented 10 months ago

We have some troubles with audiobridge. If publisher have bad network connection we hear crack sounds and it appears on every call with such clients. After change https://github.com/meetecho/janus-gateway/pull/3185/commits/ec52fbbcf3ccfcf43392743e071215ea26eb175d is a bit better, but doesn't remove issue completly. How can we debug it and help to address it?

btw on vr publisher from the same users we don't have these cracks

spscream commented 10 months ago

I tested it using clamsy utility https://jagt.github.io/clumsy/ - cracks starts after adding drops over 5%

lminiero commented 10 months ago

Unless these reports are related to the RNNoise effort, this is the wrong place to talk about it.

spscream commented 10 months ago

Unless these reports are related to the RNNoise effort, this is the wrong place to talk about it.

I created #3297 issue to adress it.

mail2mhossain commented 7 months ago

We are currently testing your PR. As part of our testing process, we're setting up an audio bridge room with the RNNoise feature activated, by setting the "denoise" option to true.

In certain circumstances, we've noticed an echo effect. This typically happens when using the laptop's built-in speaker and microphone. Interestingly, this issue does not occur when headphones are utilized.

atoppi commented 7 months ago

@mail2mhossain that sounds more like an issue with the client environment.

This typically happens when using the laptop's built-in speaker and microphone. Interestingly, this issue does not occur when headphones are utilized

Do your clients have echo cancellation mechanisms? Typically browsers do and it should be enabled by default if the device supports it. A quick one-liner test in a browser is the following:

(await navigator.mediaDevices.getUserMedia({ audio: true})).getAudioTracks()[0].getSettings().echoCancellation
lminiero commented 7 months ago

More importantly, is this indeed a problem with the RNNoise integration? Meaning, does it happen when RRNoise is enabled in the AudioBridge, but doesn't when the AudioBridge doesn't? If not, it's irrelevant to this PR, and as Alessandro said a client side problem (just wear a headset).

mail2mhossain commented 7 months ago

It appears that the issue we're encountering, specifically the echo experienced when using the laptop's built-in speaker and microphone, may indeed be related to the absence of RNNoise integration. We are considering integrating RNNoise as a potential solution to address this challenge.

The issue occurs both when RRNoise is activated in the AudioBridge and also when it's not integrated with the AudioBridge.

Our application utilizes a desktop client that incorporates the Sipsorcery WebRTC library, and for audio functionalities, we're leveraging the SIPSorceryMedia.SDL2 library. Based on your feedback, it appears that echo cancellation features are expected to be handled by the client library, in this case, SIPSorceryMedia.SDL2. Consequently, I'm understanding that RNNoise might not support this specific type of noise cancellation, correct?

mail2mhossain commented 7 months ago

Could RNNoise effectively resolve the echo issue we're facing with the laptop's built-in speaker and microphone? Or, would it be more advisable to incorporate a noise reduction library on the client side?

atoppi commented 7 months ago

Could RNNoise effectively resolve the echo issue we're facing with the laptop's built-in speaker and microphone? Or, would it be more advisable to incorporate a noise reduction library on the client side?

Noise suppression and echo suppression are very different processes.

Noise suppression subtracts from a generic stream what is being recognized as background noise and could be performed in both media server (audiobridge with RNNoise in this case) or in the endpoint. On the other hand echo suppression is a process happening on an endpoint that subtracts from the microphone stream the echo that is bouncing back from the loudpseakers.

lminiero commented 5 months ago

Merging.