voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark
https://codecsuperb.com
183 stars 20 forks source link

Codec SUPERB Challenge——How to use codec_superb_data for evaluation? #32

Open McFlyy21 opened 1 month ago

McFlyy21 commented 1 month ago

Hi, I found that codec_superb_data contains many datasets and does not give the code for data preprocessing, does it mean that I need to resynthesize each dataset separately by myself according to the two dataset classifications of SPEECH and AUDIO, and run run.sh separately for evaluating resynthesized audio obtained based on each dataset? Or do I need to put similar resynthesized files under either SPEECH or AUDIO classification together in advance, and run run.sh to get a score for resynthesized audio for all datasets under the same classification? I'm a bit confused about the evaluation rules and would appreciate an answer.

hbwu-ntu commented 1 month ago

Thank you for reaching out. The only thing you should take care is to generate the synthesised speech and audio by yourself, and put them under syn_path: https://github.com/voidful/Codec-SUPERB/blob/SLT_Challenge/run.sh#L5. The data under ref_path and syn_path should follow the same structure to run run.sh. The run.sh will automatically evaluate different tasks for different datasets, such as using ravdess for emotion recognition in stage 1, LibriSpeech for ASR in stage 3, etc.

McFlyy21 commented 1 month ago

Thanks for your reply, so I need to manually split multiple datasets from the public test set codec_superb_data you guys gave into two big datasets according to the classification of SPEECH and AUDIO, and then get the re-synthesized speech and audio separately and run run.sh to get the two sets of evaluations under the classification of SPEECH and AUDIO respectively right?

hbwu-ntu commented 1 month ago

Basically yes. One more thing: the evaluation data is small, so it doesn't take long to re-synthesis data. You may use ChatGPT, give it the ref_path folder structure, let ChatGPT give you a script to leverage your codec for re-synthesis, save them in the same folder structure as the ref_path.

Slyne commented 1 month ago

Basically yes. One more thing: the evaluation data is small, so it doesn't take long to re-synthesis data. You may use ChatGPT, give it the ref_path folder structure, let ChatGPT give you a script to leverage your codec for re-synthesis, save them in the same folder structure as the ref_path.

Q1: For objective test, do we only need to consider datasets under samples/ folder? We should also classify these datasets to audio and speech by ourselves? Could you help check if the below classification is right?

List the datasets under samples/:

dataset Type
crema_d speech
esc-50 audio
fluent-speech-commands speech
fsd50k audio
gunshot_triangulation audio
libri2Mix_test speech
librispeech speech
quesst speech
snips_test_valid_subset speech
vox_lingua_top10 speech
voxceleb1 speech

Q2: Actually I also found there are different sampling rates for these test datasets including those for the downstream tasks. Are we expected to train a universal codec model with high sampling rate (to process low sampling rate and high sampling rate)? Or we can use different codec models for these tasks?

hbwu-ntu commented 1 month ago

Q1: Yes Q2: Either is acceptable. If you have multiple codec models, please specify which codec corresponds to each sampling rate during submission.