Closed Slyne closed 2 months ago
Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?
bit width=8kbps model trained with 16k samples only Downstream task
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition.
Acc: 75.21%
Stage 2: Run speaker related evaluation.
Parsing the resyn_trial.txt for resyn wavs
Run speaker verification.
EER: 1.56%
Stage 3: Run automatic speech recognition.
WER: 3.13%
Stage 4: Run audio event classification.
ACC: 83.30%
Objective result
Log results
--------------------------------------------------
File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation.
SDR: mean score is: 2.2232599443745995
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.5125315
Stage 3: Run STOI.
stoi: mean score is: 0.8384541409928323
Stage 4: Run PESQ.
pesq: mean score is: 1.5559590673446655
--------------------------------------------------
File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation.
SDR: mean score is: -4.602151194644759
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.3825583
--------------------------------------------------
File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.47528477173951
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.4804714
Stage 3: Run STOI.
stoi: mean score is: 0.9478413458556251
Stage 4: Run PESQ.
pesq: mean score is: 3.0518312084674837
--------------------------------------------------
File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation.
SDR: mean score is: -2.0076792522998725
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.457246
--------------------------------------------------
File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.94366284167626
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.6988914
--------------------------------------------------
File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation.
SDR: mean score is: 3.6701485211578273
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5391313
Stage 3: Run STOI.
stoi: mean score is: 0.9362651811605514
Stage 4: Run PESQ.
pesq: mean score is: 2.1895537614822387
--------------------------------------------------
File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation.
SDR: mean score is: 8.627505998814492
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5454265
Stage 3: Run STOI.
stoi: mean score is: 0.9568509707064634
Stage 4: Run PESQ.
pesq: mean score is: 3.316485096216202
--------------------------------------------------
File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.899273166546299
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.237886
Stage 3: Run STOI.
stoi: mean score is: 0.9110949624359219
Stage 4: Run PESQ.
pesq: mean score is: 2.5656625175476075
--------------------------------------------------
File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation.
SDR: mean score is: 11.001265123350482
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.7819229
Stage 3: Run STOI.
stoi: mean score is: 0.9753332596498754
Stage 4: Run PESQ.
pesq: mean score is: 3.383010833263397
--------------------------------------------------
File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation.
SDR: mean score is: 6.818639103419933
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9194256
Stage 3: Run STOI.
stoi: mean score is: 0.9198648881684639
Stage 4: Run PESQ.
pesq: mean score is: 1.922272914648056
--------------------------------------------------
File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation.
SDR: mean score is: 7.051308404176289
Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.8565342
Stage 3: Run STOI.
stoi: mean score is: 0.9340248268933423
Stage 4: Run PESQ.
pesq: mean score is: 3.0424613475799562
--------------------------------------------------
Average SDR for speech datasets: 6.845835629197429
Average Mel_Loss for speech datasets: 1.8591661750000001
Average STOI for speech datasets: 0.9274661969828843
Average PESQ for speech datasets: 2.6284045933187006
Average SDR for audio datasets: 0.11127746491054295
Average Mel_Loss for audio datasets: 2.1795652333333333
If possible, could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know brief descriptions and how to do inference using your model (we will leverage your model to test on the hidden set)?
If possible, could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know brief descriptions and how to do inference using your model (we will leverage your model to test on the hidden set)?
Was uploading models. Just sent the email. Please check. Thanks!
Perfect. Thank you.
Bit rate=8k
Downstream tasks (only 16khz model used)
For reference, DAC 44.1khz for
audio_event_classification
gotACC: 90.55%
Objective Results (16khz model for 16khz samples and 48khz model for 48khz samples)