Query Regarding the Impact of Varied Acoustic Environments on Model Performance

yihong1120 commented 5 months ago

Dear Resemble Enhance Team,

I hope this message finds you well. I am reaching out to inquire about the robustness of the Resemble Enhance AI models, particularly in relation to their performance across diverse acoustic environments.

Having perused your documentation and successfully utilised your tool for speech enhancement and denoising, I've observed impressive results in standard settings. However, I am curious about the model's adaptability when confronted with audio data recorded in atypical acoustic spaces, which may not be well-represented in the training datasets.

Specifically, my questions are as follows:

How does the model cope with audio inputs recorded in highly reverberant spaces, or those with unique echo characteristics that might diverge significantly from the RIR datasets used during training?
Is there a recommended approach to fine-tuning the model on a custom dataset that includes such unique acoustic characteristics, to better tailor the enhancement capabilities to specific environments?
Could you provide insights into the model's limitations when dealing with extreme noise conditions or non-linear distortions that are not commonly found in everyday scenarios?

Understanding these aspects is crucial for my ongoing project, which involves processing archival audio recordings that exhibit a wide range of acoustic anomalies.

I appreciate the cutting-edge work your team has accomplished with Resemble Enhance and look forward to any guidance you can provide on the aforementioned queries.

Best regards, yihong1120

4lvrz commented 4 months ago

Not a team member, but concerned about same issues here.

After some testing, it seems denoise system do not completely de-reverb input audios as other systems. It should be possible by fine-tuning and modifying denoise training stage. I noticed that input verification samples in this stage contains reverb generated by RIRs (at least in my enviroment), so it might be mantained during training.

In the other hand, enhance system seems to nicely de-reverb, but sometimes lacks on bad S/N ratio, so high distortion, reverb or noise derive in made up words during inference (most noticeable in foreign languagues). Anyway, i have tried the pre-trained model over telephone slightly distorted audios (sf=8kHz) and it recovers nice sounding voices.

It will be nice to hear some news about specific fine-tuning for enhance stage :)

GUUser91 commented 3 months ago

A workaround for me to lessen the reverb effect with denoise is to overlay the audio with loud background music via kdenlive. Then I denoise the kdenlive audio output file. Original unaltered audio. https://vocaroo.com/1kvyT6Wh8A2v Denoised audio that had loud background music https://vocaroo.com/1i7q7woh25jt The background music I used in case anyone is wondering https://soundcloud.com/udi-harpaz-composer/spidy-meets-his-girl?in=udi-harpaz-composer%2Fsets%2Fspiderman-by-udi-harpaz

Edit: Second way is if you have the cash is to use Izotope RX 10 and use then the dialogue de-reverb tool. Here are samples. An unaltered noisy reverb audio. https://vocaroo.com/1bbTHGMH6y62 Same audio after using denoising via resemble enhance. https://vocaroo.com/188lnq8oSTpy Same denoised audio after using dialogue de-reverb via Izotope RX 10. https://vocaroo.com/15OY5Zupxtj7

Other options are dxRevive Pro and AI-coustics and Acon Digital DeVerberate 3 and WAVES Clarity VX DEREVERB pro.

Second Edit: I tried the Acon Digital DeVerberate 3 plug in via audacity and it works amazingly well in conjunction with resemble enhance, preferably the gradio app, not the commandline version because for some reason, the output files from the commandline version produce horrible results in GPT-SOVITS. Here's an example. An unaltered noisy reverb audio. https://vocaroo.com/1kR6uZnS0FF6 Same audio after using denoising via resemble enhance. https://vocaroo.com/11ltmuXBvFrK Same denoised audio after using the Acon Digital DeVerberate 3 plug in via audacity. https://vocaroo.com/1fltVaIObMC4 Same file after using Iztope RX 10 Voice Denoise https://vocaroo.com/13cFs2Uq0S2V

resemble-ai / resemble-enhance

Query Regarding the Impact of Varied Acoustic Environments on Model Performance #14