How to further improve the effect of tse？

mcw519 / PureSound

Make the sound you hear pure and clean by deep learning.

7 stars 0 forks source link

How to further improve the effect of tse？ #3

Closed KollyYang closed 6 months ago

KollyYang commented 1 year ago

Hi Thank you very much for open-sourcing this project. This is a great open-source project, which is very helpful to me, because I just came into contact with tse recently, but currently there are relatively few tse projects/demo on github.

I first carefully read the source code, and then ran tse/demo_app, using skim_causal_460_wNoise_IS_tsdr.ckpt to do the test, the effect is not very satisfactory.

At the same time, the above model was also used for offline testing. Using the synth data in https://github.com/eeskimez/pse-samples for comparison, it was found that the effect was worse than that of pdcattunet.

Based on your current tse results, is there any possibility to improve the effect? Do you have some suggestions? Thanks!

mcw519 commented 1 year ago

Hi, Can you share the testing data and its related enrollment speech to me? Let me check the results first. Thanks.

KollyYang commented 1 year ago

Hi Are these synthesized audios okay? Thanks!

mcw519 commented 1 year ago

Hi Kolly,

The synthesized data is okay but looks there are no enrollment speech can enroll the speaker for TSE task. I thinks the folder including: xxx_no_processing.wav -> input audio xxx_ref.wav -> ground truth and the others are different model's output results.

Thanks.

KollyYang commented 1 year ago

Hi Wu Can xxx_ref.wav be used as enrollment speech? And Here's some real data.

linh_ref_long.wav and  binh_ref_long.wav  are enrollment wav
binh_linh_newspaper_music_noise.wav and  binh_noise.wav  are mix wav.

I tested worse than voicefilter, of course, voicefilter has larger size and more computation. Thanks !

mcw519 commented 1 year ago

Thanks you provided the real cases. I ran the inference through the evaluation mode in main.py and get the results here proc.zip. I think the pre-trained model does not has good robustness to handle the music noises because the noise corpus is WHAM.

KollyYang commented 1 year ago

Hi Wu Thanks, this is voice-filter_proc.zip results , Clearer and more natural.