Codec SUPERB Challenge——How to use codec_superb_data for evaluation？

McFlyy21 commented 1 month ago

Hi, I found that codec_superb_data contains many datasets and does not give the code for data preprocessing, does it mean that I need to resynthesize each dataset separately by myself according to the two dataset classifications of SPEECH and AUDIO, and run run.sh separately for evaluating resynthesized audio obtained based on each dataset? Or do I need to put similar resynthesized files under either SPEECH or AUDIO classification together in advance, and run run.sh to get a score for resynthesized audio for all datasets under the same classification? I'm a bit confused about the evaluation rules and would appreciate an answer.

hbwu-ntu commented 1 month ago

Thank you for reaching out. The only thing you should take care is to generate the synthesised speech and audio by yourself, and put them under syn_path: https://github.com/voidful/Codec-SUPERB/blob/SLT_Challenge/run.sh#L5. The data under ref_path and syn_path should follow the same structure to run run.sh. The run.sh will automatically evaluate different tasks for different datasets, such as using ravdess for emotion recognition in stage 1, LibriSpeech for ASR in stage 3, etc.

McFlyy21 commented 1 month ago

Thanks for your reply, so I need to manually split multiple datasets from the public test set codec_superb_data you guys gave into two big datasets according to the classification of SPEECH and AUDIO, and then get the re-synthesized speech and audio separately and run run.sh to get the two sets of evaluations under the classification of SPEECH and AUDIO respectively right?

hbwu-ntu commented 1 month ago

Basically yes. One more thing: the evaluation data is small, so it doesn't take long to re-synthesis data. You may use ChatGPT, give it the ref_path folder structure, let ChatGPT give you a script to leverage your codec for re-synthesis, save them in the same folder structure as the ref_path.

Slyne commented 1 month ago

Basically yes. One more thing: the evaluation data is small, so it doesn't take long to re-synthesis data. You may use ChatGPT, give it the ref_path folder structure, let ChatGPT give you a script to leverage your codec for re-synthesis, save them in the same folder structure as the ref_path.

Q1: For objective test, do we only need to consider datasets under samples/ folder? We should also classify these datasets to audio and speech by ourselves? Could you help check if the below classification is right?

List the datasets under samples/:

dataset	Type
crema_d	speech
esc-50	audio
fluent-speech-commands	speech
fsd50k	audio
gunshot_triangulation	audio
libri2Mix_test	speech
librispeech	speech
quesst	speech
snips_test_valid_subset	speech
vox_lingua_top10	speech
voxceleb1	speech

Q2: Actually I also found there are different sampling rates for these test datasets including those for the downstream tasks. Are we expected to train a universal codec model with high sampling rate (to process low sampling rate and high sampling rate)? Or we can use different codec models for these tasks?

hbwu-ntu commented 1 month ago

Q1: Yes Q2: Either is acceptable. If you have multiple codec models, please specify which codec corresponds to each sampling rate during submission.

voidful / Codec-SUPERB

Codec SUPERB Challenge——How to use codec_superb_data for evaluation？ #32