microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.13k stars 2.44k forks source link

Speech Separation with WavLM-Large #792

Open bryant0918 opened 1 year ago

bryant0918 commented 1 year ago

I am trying to use the WavLM-Large Model for the s3prl downstream task separation_stft. Is this what you used in your SUPERB evaluation of the model?

Can you share your code to actually use the model to create separate output audio files of speech and music/background noise?

Sanyuan-Chen commented 1 year ago

Hi @bryant0918

Yes, we use the s3prl repo to evaluate our WavLM models for all the SUPERB tasks. Following their official implementation, we prepare the separation data and conduct the inference as https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/docs/superb.md#ss-source-separation. You can also follow these commands by replacing the test data with your audio files.

bryant0918 commented 1 year ago

Thank you @Sanyuan-Chen,

Do I still need to use these scripts to prepare my data to run the evaluation? Or should I be able to just run the testing script with any .wav files from:

python3 run_downstream.py -m evaluate \
        -e result/downstream/ExpName/best-states-dev.ckpt \

This doesn't seem to be working for me. The run_downstream is expecting a .scp file that must have been created by the data_prepare script. Can you share an example of what is supposed to be in the .scp file?

Also, I tried to run ./generate_librimix_ss.sh storage_dir to download the data and see how it's prepared before the evaluation so that I can replicate it with my own data but I get the error:

WARNING: cannot verify us.openslr.org's certificate, issued by `/C=US/O=Let's Encrypt/CN=R3':
  Unable to locally verify the issuer's authority.
WARNING: certificate common name `danielpovey.com' doesn't match requested host name `us.openslr.org'.
HTTP request sent, awaiting response... No data received.

So it does not download any of the LibriMix data from openslr.com but it downloads the wham data from googleapi just fine.

I also don't know where it gets danielpovey.com from if it's nowhere in the bash script.

bryant0918 commented 1 year ago

Update: To fix the second issue with the script ./generate_librimix_ss.sh storage_dir not working I used curl instead of wget

bryant0918 commented 1 year ago

Hi @bryant0918

Yes, we use the s3prl repo to evaluate our WavLM models for all the SUPERB tasks. Following their official implementation, we prepare the separation data and conduct the inference as https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/docs/superb.md#ss-source-separation. You can also follow these commands by replacing the test data with your audio files.

I've followed the official implementation here to obtain an evaluation of the given dataset as well as my own and I receive output scores for sr_srdi, pesq, and stoi which is great. However, I really would like to perform an inference and obtain two output audio files separating speech or speech from background music/noise, but the evaluation does not give that and it appears that the DownstreamExpert for separation_stft has no attribute 'inference'.