maum-ai / voicefilter

Unofficial PyTorch implementation of Google AI's VoiceFilter system
http://swpark.me/voicefilter
1.08k stars 227 forks source link

Cannot reproduce reported SDR & retrain the speaker embedding #30

Open nnbtam99 opened 2 years ago

nnbtam99 commented 2 years ago

Hello, I have two questions about the implementation.

  1. I cannot reproduce the results reported in the README. I have trained for around > 400k steps on Librispeech 360h + 100h clean dataset, using the embedder provided in this repo. However, I can only obtain up to a maximum SDR of 5.5.

To obtain data from the Librispeech 360h + 100h, I generate the mixed audios for 360h and 100h separately, then add them together in another folder. Is this the right way when I want to use more data to train the voice filter module?

  1. I got worse results when retraining the speaker embedding I retrained the embedder using the following repo: Speaker verification on 3 datasets: Librispeech, VoxCeleb1, VoxCeleb2.

Theoretically, I expect the voice filter module will benefit from the embedder trained on more data, but the results got even worse. Can you share how you train this embedder?

Thank you in advance!