Closed KollyYang closed 6 months ago
Hi, Can you share the testing data and its related enrollment speech to me? Let me check the results first. Thanks.
Hi Kolly,
The synthesized data is okay but looks there are no enrollment speech can enroll the speaker for TSE task. I thinks the folder including: xxx_no_processing.wav -> input audio xxx_ref.wav -> ground truth and the others are different model's output results.
Thanks.
Hi Wu Can xxx_ref.wav be used as enrollment speech? And Here's some real data.
linh_ref_long.wav and binh_ref_long.wav are enrollment wav
binh_linh_newspaper_music_noise.wav and binh_noise.wav are mix wav.
I tested worse than voicefilter, of course, voicefilter has larger size and more computation. Thanks !
Thanks you provided the real cases. I ran the inference through the evaluation mode in main.py and get the results here proc.zip. I think the pre-trained model does not has good robustness to handle the music noises because the noise corpus is WHAM.
Hi Wu Thanks, this is voice-filter_proc.zip results , Clearer and more natural.
Hi Thank you very much for open-sourcing this project. This is a great open-source project, which is very helpful to me, because I just came into contact with tse recently, but currently there are relatively few tse projects/demo on github.
I first carefully read the source code, and then ran tse/demo_app, using skim_causal_460_wNoise_IS_tsdr.ckpt to do the test, the effect is not very satisfactory.
At the same time, the above model was also used for offline testing. Using the synth data in https://github.com/eeskimez/pse-samples for comparison, it was found that the effect was worse than that of pdcattunet.
Based on your current tse results, is there any possibility to improve the effect? Do you have some suggestions? Thanks!