microsoft / UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech
406 stars 71 forks source link

Why is my duplicated wavLM results on vox1-o is 30% worse #28

Closed AIDman closed 1 year ago

AIDman commented 2 years ago

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="">

model | EER(mine) | EER(official) -- | -- | -- wavlm_large_nofinetune.pth | 0.965 | 0.75 wavlm_large_finetune.pth | 0.631 | 0.431

The above results are the validation results of your shared wav_lm models on the original Vox1-o data without changing any code. What might be the reason for this gap? Wrong settings? Here is more background about my setting: 1) Create a conda env as:

conda create -n UniSpeech_py3p8 python=3.8

2) Following your guidance under

pip install --require-hashes -r requirements.txt 

The following error will appear:

Collecting numpy<1.23.0,>=1.16.5
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    numpy<1.23.0,>=1.16.5 from (from scipy==1.7.1->-r requirements.txt (line 1))

Then I installed the environment manually (installed around 30~40 tools) just as

3) Here is some related details: pip list | grep fairseq fairseq 0.12.1 /home/user1/tools/fairseq pip list | grep s3prl s3prl 0.3.1 torch.version: 1.9.0+cu102 python -V: 3.8.13

Thanks for your wonderful work and looking forward for your help.

YuzaChongyi commented 2 years ago

In my experiment, the wavlm_large_finetune EER is 0.574.

Sanyuan-Chen commented 2 years ago

Hi @AIDman ,

As for the environment error, could you replace this line with self.feature_extract = torch.hub.load('s3prl/s3prl:e52439edaeb1a443e82960e6401ae6ab4241def6', feat_type) and try again? The fairseq library is not necessary for inference WavLM model. As for the older version of s3prl, it can automatically skip the Import Error from fairseq, but the latest version of s3prl code would accidentally raise an ImportError.

As for the fine-tuning results for speaker verification, we use the adaptive snorm to normalize the trial scores and further apply the quality-aware score calibration as introduced in Section V.C-3 of our WavLM paper.

WhXmURandom commented 5 months ago

Hi @AIDman ,

As for the environment error, could you replace this line

with self.feature_extract = torch.hub.load('s3prl/s3prl:e52439edaeb1a443e82960e6401ae6ab4241def6', feat_type) and try again? The fairseq library is not necessary for inference WavLM model. As for the older version of s3prl, it can automatically skip the Import Error from fairseq, but the latest version of s3prl code would accidentally raise an ImportError. As for the fine-tuning results for speaker verification, we use the adaptive snorm to normalize the trial scores and further apply the quality-aware score calibration as introduced in Section V.C-3 of our WavLM paper.

Can you provide the code for the quality-aware score calibration?Thank you!