wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
630 stars 109 forks source link

SNR augmentations setup #263

Closed vanIvan closed 7 months ago

vanIvan commented 7 months ago

Hello!

Based on the following row in examples/voxceleb/v2/local/prepare_data.sh:

find ${rawdata_dir}/musan -name "*.wav" | awk -F"/" '{print $(NF-2)"/"$(NF-1)"/"$NF,$0}' >${data}/musan/wav.scp

It seems that keys that are generated in ${data}/musan/wav.scp are wrong: they lack the first subdir that defines type of noise: ['noise','speech','music']:

jamendo/music-jamendo-0020/00135.wav /media/DATA/augmentations/musan_split/music/jamendo/music-jamendo-0020/00135.wav
jamendo/music-jamendo-0020/00114.wav /media/DATA/augmentations/musan_split/music/jamendo/music-jamendo-0020/00114.wav
...

And should be like this:

music/jamendo/music-jamendo-0020/00135.wav /media/DATA/augmentations/musan_split/music/jamendo/music-jamendo-0020/00135.wav
music/jamendo/music-jamendo-0020/00114.wav /media/DATA/augmentations/musan_split/music/jamendo/music-jamendo-0020/00114.wav
...

Due to the missing subdir, in add_reverb_noise function when SNR range is picked augmentations SNR fallback to default for all noise files to [0, 15].

Is it right and expected behaviour? I also have noticed that my models converge to better results when all type of noises are applied with [0,15] SNR range.

JiJiJiang commented 7 months ago

image

I have checked my musan/wav.scp, where the keys are all starting from ['noise','speech','music']. Please check whether the directory structure of the musan dataset (musan.tar.gz) has been changed. You can also check the download link in examples/voxceleb/v2/local/download_data.sh

vanIvan commented 7 months ago

Okay, I see, thank you for fast checking! It seems that problem is on my side with MUSAN directories layout