microsoft / UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech
Other
406 stars 71 forks source link

Why is my duplicated wavLM results on vox1-o is 30% worse #28

Closed AIDman closed 1 year ago

AIDman commented 2 years ago

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

model | EER(mine) | EER(official) -- | -- | -- wavlm_large_nofinetune.pth | 0.965 | 0.75 wavlm_large_finetune.pth | 0.631 | 0.431

The above results are the validation results of your shared wav_lm models on the original Vox1-o data without changing any code. What might be the reason for this gap? Wrong settings? Here is more background about my setting: 1) Create a conda env as:

conda create -n UniSpeech_py3p8 python=3.8

2) Following your guidance under https://github.com/microsoft/UniSpeech/tree/main/downstreams/speaker_verification

pip install --require-hashes -r requirements.txt 

The following error will appear:

Collecting numpy<1.23.0,>=1.16.5
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    numpy<1.23.0,>=1.16.5 from https://files.pythonhosted.org/packages/2f/14/abc14a3f3663739e5d3c8fd980201d10788d75fea5b0685734227052c4f0/numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=64f56fc53a2d18b1924abd15745e30d82a5782b2cab3429aceecc6875bd5add0 (from scipy==1.7.1->-r requirements.txt (line 1))

Then I installed the environment manually (installed around 30~40 tools) just as https://github.com/microsoft/UniSpeech/issues/26

3) Here is some related details: pip list | grep fairseq fairseq 0.12.1 /home/user1/tools/fairseq pip list | grep s3prl s3prl 0.3.1 torch.version: 1.9.0+cu102 python -V: 3.8.13

Thanks for your wonderful work and looking forward for your help.

YuzaChongyi commented 2 years ago

In my experiment, the wavlm_large_finetune EER is 0.574.

Sanyuan-Chen commented 2 years ago

Hi @AIDman ,

As for the environment error, could you replace this line https://github.com/microsoft/UniSpeech/blob/e3043e2021d49429a406be09b9b8432febcdec73/downstreams/speaker_verification/models/ecapa_tdnn.py#L196 with self.feature_extract = torch.hub.load('s3prl/s3prl:e52439edaeb1a443e82960e6401ae6ab4241def6', feat_type) and try again? The fairseq library is not necessary for inference WavLM model. As for the older version of s3prl, it can automatically skip the Import Error from fairseq, but the latest version of s3prl code would accidentally raise an ImportError.

As for the fine-tuning results for speaker verification, we use the adaptive snorm to normalize the trial scores and further apply the quality-aware score calibration as introduced in Section V.C-3 of our WavLM paper.

WhXmURandom commented 5 months ago

Hi @AIDman ,

As for the environment error, could you replace this line

https://github.com/microsoft/UniSpeech/blob/e3043e2021d49429a406be09b9b8432febcdec73/downstreams/speaker_verification/models/ecapa_tdnn.py#L196

with self.feature_extract = torch.hub.load('s3prl/s3prl:e52439edaeb1a443e82960e6401ae6ab4241def6', feat_type) and try again? The fairseq library is not necessary for inference WavLM model. As for the older version of s3prl, it can automatically skip the Import Error from fairseq, but the latest version of s3prl code would accidentally raise an ImportError. As for the fine-tuning results for speaker verification, we use the adaptive snorm to normalize the trial scores and further apply the quality-aware score calibration as introduced in Section V.C-3 of our WavLM paper.

Can you provide the code for the quality-aware score calibration?Thank you!