Open yihong1120 opened 5 months ago
Not a team member, but concerned about same issues here.
After some testing, it seems denoise system do not completely de-reverb input audios as other systems. It should be possible by fine-tuning and modifying denoise training stage. I noticed that input verification samples in this stage contains reverb generated by RIRs (at least in my enviroment), so it might be mantained during training.
In the other hand, enhance system seems to nicely de-reverb, but sometimes lacks on bad S/N ratio, so high distortion, reverb or noise derive in made up words during inference (most noticeable in foreign languagues). Anyway, i have tried the pre-trained model over telephone slightly distorted audios (sf=8kHz) and it recovers nice sounding voices.
It will be nice to hear some news about specific fine-tuning for enhance stage :)
A workaround for me to lessen the reverb effect with denoise is to overlay the audio with loud background music via kdenlive. Then I denoise the kdenlive audio output file. Original unaltered audio. https://vocaroo.com/1kvyT6Wh8A2v Denoised audio that had loud background music https://vocaroo.com/1i7q7woh25jt The background music I used in case anyone is wondering https://soundcloud.com/udi-harpaz-composer/spidy-meets-his-girl?in=udi-harpaz-composer%2Fsets%2Fspiderman-by-udi-harpaz
Edit: Second way is if you have the cash is to use Izotope RX 10 and use then the dialogue de-reverb tool. Here are samples. An unaltered noisy reverb audio. https://vocaroo.com/1bbTHGMH6y62 Same audio after using denoising via resemble enhance. https://vocaroo.com/188lnq8oSTpy Same denoised audio after using dialogue de-reverb via Izotope RX 10. https://vocaroo.com/15OY5Zupxtj7
Other options are dxRevive Pro and AI-coustics and Acon Digital DeVerberate 3 and WAVES Clarity VX DEREVERB pro.
Second Edit: I tried the Acon Digital DeVerberate 3 plug in via audacity and it works amazingly well in conjunction with resemble enhance, preferably the gradio app, not the commandline version because for some reason, the output files from the commandline version produce horrible results in GPT-SOVITS. Here's an example. An unaltered noisy reverb audio. https://vocaroo.com/1kR6uZnS0FF6 Same audio after using denoising via resemble enhance. https://vocaroo.com/11ltmuXBvFrK Same denoised audio after using the Acon Digital DeVerberate 3 plug in via audacity. https://vocaroo.com/1fltVaIObMC4 Same file after using Iztope RX 10 Voice Denoise https://vocaroo.com/13cFs2Uq0S2V
Dear Resemble Enhance Team,
I hope this message finds you well. I am reaching out to inquire about the robustness of the Resemble Enhance AI models, particularly in relation to their performance across diverse acoustic environments.
Having perused your documentation and successfully utilised your tool for speech enhancement and denoising, I've observed impressive results in standard settings. However, I am curious about the model's adaptability when confronted with audio data recorded in atypical acoustic spaces, which may not be well-represented in the training datasets.
Specifically, my questions are as follows:
Understanding these aspects is crucial for my ongoing project, which involves processing archival audio recordings that exhibit a wide range of acoustic anomalies.
I appreciate the cutting-edge work your team has accomplished with Resemble Enhance and look forward to any guidance you can provide on the aforementioned queries.
Best regards, yihong1120