intel / openvino-plugins-ai-audacity

A set of AI-enabled effects, generators, and analyzers for Audacity®.
GNU General Public License v3.0
938 stars 60 forks source link

noise suppression reduces audio sample rate / quality even on unselected audio regions #69

Closed Leggyweggy closed 5 months ago

Leggyweggy commented 7 months ago

i noticed that selecting a portion of a track to apply noise suppression to seems to change the resulting sample rate or overall quality of even the unselected portions. I'm guessing this has something to do with the audio quality the model was built for and as a result needs to be inputted with in terms of sample rate and bit rate similar to how Whisper works.

RyanMetcalfeInt8 commented 7 months ago

Hi @Leggyweggy,

Thanks for reporting this. I didn't realize that this was happening, but reviewing my implementation you're exactly right. We have a bunch of updates to noise suppression for the next release coming very soon, so let me squeeze a fix in for this.

Edit: And just to give more detail -- the original noise suppression model that we released with back in December requires 16khz input, so that is why we downsample. The new models we are integrating for next release (deepfilter2, deepfilternet3) operate on 48khz.

Cheers! Ryan

Elshara commented 6 months ago

I also noticed that noise suppression has way less DB range than Audacity's noise reduction feature. And for some reason, it likes to distort audio with random bit depth ranges. As if it is a 3 bit audio file at lower DB ranges. Steps to reproduce are as follows. First, import an audio track into Audacity. Make sure the track is not noise suppressed. Second, lower the volume using the amplify effect on the track to something greater than -15DB. The reason will be obvious in a minute. third, apply openVINO noise suppression on the track. Wait until it is done processing. Notice there are no options to customize DB range, peak values, bit depth, in order to match what types of noise and what levels its consistency is at to block anything out within such a range. Fourth, play the track. You will hear the suppression immediately take effect. As you listen, not only do you notice the sample rate enforced at 16KHZ but also, as mentioned here, bit depth varies. Certain types of noises it blocks out by reducing its DB range. However if you apply a compressor on the track, it is clear that this blocked out noise is still visible in the audio sample. Whereas Audacity's existing noise reduction effect, can block out just DB ranges of unwanted noise as it filters out anything at precise DB ranges. However its robotic effect leaves a lot to be desired. What OpenVINO noise suppression is really good for, is blocking out analog and background noise. Everything from dogs barking, to dish washing, to tape hiss, to crowds cheering, to electrical humming. Would it at all be possible in future releases of this module, to customize what sounds to keep and what to avoid in future? As right now, it only works to isolate vocals. I would love to see this module be able to isolate things it can detect in the audio clip itself if at all possible. Perhaps even fill in replacement sounds, to try to mitigate distorted audio, or have presets and some kind of graphical user interface depending on the audio available loaded into Audacity within a given project for AI based analysis. an example of this feature could be, if you had two audio samples of the same person talking. one sample was with the person speaking with a head cold but on a very decent microphone. the other would be a sample of the same person speaking but with no head cold on a very poor quality microphone. Noise Suppression could in theory, be able to ask you if you loaded both tracks into Audacity, what samples to pull from in order to maybe even merge the output of one file into the other, as in generative speech to speech but with more subtle nuancing designed to fix audio sampling issues. It may even be useful to have a plugin that could utilize other effects within Audacity to achieve this functionality, such as an EQ or a voice replacer. By far, and I am an avid Audacity user since 2009, this is the best module ever made, not only in the open source world, but in the music space as a whole going far beyond the potential of auto tune, that has ever been envisioned into reality. Thank you very much for making this possible. and considering such ideas, if any get implemented.

RyanMetcalfeInt8 commented 6 months ago

Hi @Elshara,

Wow, thank you for the feedback! I just want to note that we have another release coming soon (and an installer this time!) with more noise suppression models (like deepfilternet2 / deepfilternet3), and these new models are a bit more configurable.

You have some really good ideas for new features -- I'll think about how we could potentially support these kinds of things... let me know if you spot a good reference for these kinds of features in other open source projects -- it's very helpful to have a reference implementation that I can port from.

Thanks! Ryan

RyanMetcalfeInt8 commented 6 months ago

Hi @Elshara,

We've posted an updated release here: https://github.com/intel/openvino-plugins-ai-audacity/releases/tag/v3.5.0-R2

The updated release should resolve this issue -- and you'll find some new options are available for noise suppression. When you get a chance, can you try it out and confirm that it resolves the specific issue that you raised here?

Thanks! Ryan

Elshara commented 5 months ago

Sure thing! It's nice to hear feedback goes somewhere. You're a fantastic developer.

Update: Installer fails to download anything with error code 12030. It could be a local issue, will try on another machine to confirm. If so then I will start a new thread about it.

Thanks for letting me know it was released! The previous one is totally unusable with Audacity 3.5.

RyanMetcalfeInt8 commented 5 months ago

Hi @Elshara,

Thanks!

Update: Installer fails to download anything with error code 12030. It could be a local issue, will try on another machine to confirm. If so then I will start a new thread about it.

Shoot -- there have been a couple other reports of this... I haven't yet been able to understand why it's happening. Looks like you found the existing issue for it (just saw your comment pop up).

RyanMetcalfeInt8 commented 5 months ago

And just a heads up, that you can manually populate the 'openvino-models' folder by downloading the required models with your browser (and just choose 'install no models' in installer)

Using some of the links from this as a reference, for example, you'll find that the noise suppression models are stored here: deep filter models: https://huggingface.co/Intel/deepfilternet-openvino

'legacy' denseunet model that we has support for last release: https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/noise-suppression-denseunet-ll-0001/FP16/noise-suppression-denseunet-ll-0001.bin https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/noise-suppression-denseunet-ll-0001/FP16/noise-suppression-denseunet-ll-0001.xml

The intent of course was to have the installer take care of downloading all that stuff on behalf of the user, but again, you can always download it yourself.

Elshara commented 5 months ago

Hey thank you very much for the clue on where to find these files!

I think Audacity has already released 3.5.1 as an Alpha release lol But this makes me want to downgrade and see if this issue still exists regardless. the best, and now worst, thing that Muse ever did, is include an automatic updater for Audacity.

I will check now to specifically see if this issue still persists with the latest models for Noise Suppression.

Just for reference, the manual links, wish these were all zipped up under Openvino Audacity Files BTW, are here.

Cancel Noise:

https://huggingface.co/Intel/deepfilternet-openvino/blob/main/deepfilternet2.zip

https://huggingface.co/Intel/deepfilternet-openvino/blob/main/deepfilternet3.zip

Legacy: (Unzipped)

https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/noise-suppression-denseunet-ll-0001/FP16/noise-suppression-denseunet-ll-0001.bin

https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/noise-suppression-denseunet-ll-0001/FP16/noise-suppression-denseunet-ll-0001.xml

This file didn't download automatically, it took awhile before it displayed in Firefox then I could save it manually.

Separate Musical Instruments: (Unzipped)

https://huggingface.co/Intel/demucs-openvino/blob/main/htdemucs_v4.bin

https://huggingface.co/Intel/demucs-openvino/blob/main/htdemucs_v4.xml

Note in order to save this file, Hugging Face doesn't show a download button for xml data. And so since I opened it in Firefox, I had to click save as under the file menu from the browser as I was unable to download it directly otherwise.

Generate Music By Text:

https://huggingface.co/Intel/musicgen-static-openvino/blob/main/musicgen_small_enc_dec_tok_openvino_models.zip

https://huggingface.co/Intel/musicgen-static-openvino/blob/main/musicgen_small_mono_openvino_models.zip

https://huggingface.co/Intel/musicgen-static-openvino/blob/main/musicgen_small_stereo_openvino_models.zip

Transliterate Audio:

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-base-models.zip

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-small.en-tdrz-models.zip

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-small-models.zip

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-medium-models.zip

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-large-v1-models.zip

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-large-v2-models.zip

https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-large-v3-models.zip

I believe that references all of the options located in the installer. I could be mistaken however. They all worked previously before, obviously to varying degrees of success...legacy limits notwithstanding of course.

It may be useful to add these links as references for documentation, just in case other people can't get the installer to retrieve the files. Still don't know what's up with that though. I tracked it down to an issue with Windows programs being unable to access outbound winhttp and I didn't want to bother playing around with registry keys to fix it. I suggest trying a different protocol such as http, https, tls etc to see if Windows in some instances is being more strict than it needs to be.

What's cool about these effects for audacity, is that they work on AMD CPU processors without any problem. I have an old Rizen 2920W from 2018 that runs this with no issue, their pre-pro threadripper series is the best I ever used in a modern system. Generations take awhile but they do work.

Hope this helps.

Elshara commented 5 months ago

Here's an update. Long overdue I know.

I was able to manually get all the plugins to work accept for Music Generation, no idea of the folder structure for that one.

As for noise cancellation, Deep Filter Net has some problems that Dense U Net doesn't.

Let's start with the basics.

I think Deep Filter Net is at least attempting to pool from 24 KHZ instead of 16KHZ as Dense U Net does, but the result sounds of minimal improvement here.

There is a drastic difference from Deep Filter Net Version 2 and Deep Filter Net version 3 quality wise. In that version 2 does a significantly better job at regulating audio volume, compared to the ratio of dynamic background noise.

As for static background noise, all three models do a decent enough job at blocking that out to around 36 to 42 DB. With Deep Filter Net actually improving the previous Dense U Net noise level in this way, in terms of vocal extraction methods used on audio samples.

However, there is a big problem.

The bit depth issues that Dense U Net had, are actually way more pronounced in Deep Filter Net 3 by a huge margin. In fact it is so distorted, that the quality of Deep Filter Net 3 is actually worse, than Dense U Net in terms of both audio clarity, and adding random volume fluctuations where no noise was being cancelled out in excess from the original audio.

The bit depth range of frequent volume fluctuations ranges from 8 DB to 22 DB at very quick interval periods, less than 250 MS at times. Peaking at over 462 MS during extended attack ratios of micro compression periods. Note that this is only happening when referencing dynamic noise, and not static. Particularly when hearing things like birds chirping, dogs barking and other sudden loud noises regardless of what the attenuation factor is set to. Most notably, applause, human coughs and short snippets of loud audio cut outs suffer greatly here, in regards to Deep Filter Net version 3's performance. Even if the box is checked to filter out dynamic compression at the expense of greater attenuation fluctuations. Resulting in an audio file, that sounds quite choppy to listen to.

Now by contrast, Deep Filter Net version 2 is almost like a night and day difference, in terms of audio smoothness, quality and presentation from its successor in every way. clarity is excellent, and there is a genuine attempt to extract vocals, rather than filter out non vocals.

You still have metered DB fluctuations, but they are less frequent, even less so than Dense U Net, and do not contain much choppy audio. However, the compression attack range is longer, averaging between a 300 MS to 400 MS duration. Resulting in an audio file that you can still tell is edited or regulated, but not nearly as much.

Dense U Net, even with its clarity issues, falls in between Deep Filter Net versions 2 and 3 respectively. With bit depth issues still suffering, but only during peak moments measured at the limit of regular discussion volume for vocals. So things like cheering crowds will sometimes give false positives as someone shouting their head off to be filtered out.

This is my analysis so far.

RyanMetcalfeInt8 commented 5 months ago

Hi @Elshara,

Very sorry it took me so long to respond! Thank you so much for the very, very detailed analysis, and review of the current set of noise suppression features.

For denseunet2 & 3, it may be worthwhile for you to provide such detailed feedback to that open source project, located here: https://github.com/Rikorose/DeepFilterNet

While I found your analysis interesting, much of it went over my head 😆 -- but I'm sure the developers of those models may find it very useful!

RyanMetcalfeInt8 commented 5 months ago

Hi @Elshara,

I'm going to close this 'issue' -- as it was originally intended to track a bug. I would encourage you to copy & paste your above analysis into a new discussion here: https://github.com/intel/openvino-plugins-ai-audacity/discussions/categories/show-and-tell

Thanks again for your efforts!