intel / openvino-plugins-ai-audacity

A set of AI-enabled effects, generators, and analyzers for Audacity®.
GNU General Public License v3.0
871 stars 55 forks source link

Request: voice enhancement #13

Closed nachazo closed 4 months ago

nachazo commented 8 months ago

Hello! This is a totally fantastic toolbox, congrats & thanks!

I only miss a common feature I use: AI voice enhancement. Normally, I use "Enhance speech" from Adobe (podcast.adobe.com/enhance).

I don't know if using open AI models someone are using anything similar, but here is my request and "desires" for next version!

Many thanks!

LWinterberg commented 8 months ago

I have a suspicion that the "enhance speech" of Adobe podcast is just a EQ + compressor slapped onto your voice. Those are things you can do in Audacity as well, either using the native plugins or one of the many VSTs. AI models likely will be way overkill for that.

nachazo commented 8 months ago

I have a suspicion that the "enhance speech" of Adobe podcast is just a EQ + compressor slapped onto your voice. Those are things you can do in Audacity as well, either using the native plugins or one of the many VSTs. AI models likely will be way overkill for that.

Hi! In my experience, the process also applies noise reduction and echo remover. Then, EQ and seems to do some compressor+limiter.

In my opinion, the most interesting is the AI echo remover (as the noise reduction is in openvino already).

RyanMetcalfeInt8 commented 8 months ago

Hi @nachazo,

Thanks for the feedback! I'll keep an eye out for more open source noise suppression / voice enhancement models that we could potentially add support for. I have been looking around a bit for noise suppression models that work better than the default one included (dense-unet), as I'm not too thrilled with the quality that it produces... nothing has jumped out yet but I need to keep looking. Ideally I'd like to support a set of noise suppression models that work well for various situations / environments.

Regards, Ryan

RyanMetcalfeInt8 commented 8 months ago

Just a quick update here -- I was looking into this open source project ( https://github.com/resemble-ai/resemble-enhance ). A nice writeup is here:

It provides a couple of models. One for denoising, and one for enhancement. This could be something worth porting over / pulling into the set of plugins if the quality is good enough.

I was trying out the web-based demo on some noisy audio that I had lying around. I thought that the denoise output was similar to running 'noise suppression + normalize', so I wasn't too impressed with that one. I found the 'enhanced' audio sounded a bit overprocessed. All in all, pretty far away from Adobe's speech enhance output (which is what I'm looking to find).

Anyway, I concluded my initial evaluation on resemble as something I'll wait on, and see if this project improves (it's only a few weeks old after all).

It's possible I was expecting too much, or used samples that it wasn't trained to enhance. Let me know if anyone has good luck with the demo, and thinks it would indeed be useful as new plugin feature. Happy to take a second glance.

Also, feel free to point out open source projects that I might have missed.

Ryan

nachazo commented 7 months ago

Hi! I don't know the license or if the code is only for nvidia, but here: https://github.com/NVIDIA/MAXINE-AFX-SDK Seems to be some effects of the "NVIDIA Broadcast" app (https://www.nvidia.com/en-us/geforce/broadcasting/broadcast-app/) (Room Echo Cancellation, Background Noise Suppression). Video demo: https://www.youtube.com/watch?v=_kHFTeL1RVU See you!!

RyanMetcalfeInt8 commented 7 months ago

Hi @nachazo,

Right, it looks like model itself is proprietary, as far as I can tell -- so unfortunately I can't do much with that.

The part of the code that is open source (MIT) seems to be the upper layer of (control) software that sort of moves audio samples in / out of the broadcast SDK.

Thanks, Ryan