soham97 / PAM

PAM is a no-reference audio quality metric for audio generation tasks
MIT License
50 stars 5 forks source link

Testing data #2

Open bigpon opened 1 month ago

bigpon commented 1 month ago

Hi @soham97, thanks for sharing this interesting work. I have questions about the speech testing data for reproducing your results on the paper.

  1. For the NISQA, can you share the list of the files you used for evaluations?
  2. For the TTS, do you know where can I download the testing data you used from Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech?
  3. For the VMC2022, can you also share the lists of the files you used for evaluations?
  4. For the Noise suppression, can you also share the lists of the files (from DNS2021) you used for evaluations?
soham97 commented 1 month ago

Hi @bigpon, thanks for trying out PAM! We released all the human annotation and data artifacts created for the paper here.

  1. For NISQA, we use only simulated and live talk corpus and filter the ones will non-speech audio added from DNS. I'll share the file list in a .csv later
  2. For the TTS testing data, you can use the LibriTTS subset from this paper "Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech". The data can be downloaded from here. Please cite the original paper if you use the dataset
  3. For the VMC2022 audio files, they can be downloaded from this Zenodo link. Additionally, this CSV file contains the audio file names along with their corresponding PAM score, MOS, MOSNet, and MOS-SSL
  4. The DNS2021 files are from ICASSP 2021 Deep Noise Suppression Challenge. This dataset is not publicly available, so I recommend contacting the organizers or hagamper@microsoft.com for data release questions

Hope the above help!

bigpon commented 1 month ago

Hi @soham97, Thank you so much! The information are very helpful!! Looking forward the file list of NISQA, thanks!!