Open huazhi1024 opened 2 weeks ago
Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?
Yes, I will finish it by Monday. However, I am currently encountering some issues with uploading the model to GitHub.
---Original--- From: @.> Date: Sat, Jun 15, 2024 13:18 PM To: @.>; Cc: @.**@.>; Subject: Re: [voidful/Codec-SUPERB] results (Issue #37)
Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Here is one suggestion: the codec model ckpt can be uploaded to huggingface or google drive (with an instruction to use gdown
to download the model)
Hello,I have completed the model release. The download link and usage instructions have been sent to you via email. If you have any questions, please feel free to contact me. Thank you very much.
---Original--- From: @.> Date: Sat, Jun 15, 2024 14:48 PM To: @.>; Cc: @.**@.>; Subject: Re: [voidful/Codec-SUPERB] results (Issue #37)
Here is one suggestion: the codec model ckpt can be uploaded to huggingface or google drive (with an instruction to use gdown to download the model)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.97%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 2.57%
Stage 3: Run automatic speech recognition. WER: 3.64%
Stage 4: Run audio event classification. ACC: 71.10%
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 4.641087071226074
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.580518
Stage 3: Run STOI. stoi: mean score is: 0.7878352309918871
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is:
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 8.476178635258437
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.75820357
Stage 3: Run STOI. stoi: mean score is: 0.923865876017417
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 0.4370140327990164
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 1.1408946927353245
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 4.2329371204943
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7518392
Stage 3: Run STOI. stoi: mean score is: 0.9050986518783571
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 7.752400420683839
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.72355676
Stage 3: Run STOI. stoi: mean score is: 0.9340837095549291
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 8.433560426646096
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8292966
Stage 3: Run STOI. stoi: mean score is: 0.8863539521867545
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 9.542030656936957
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7960729
Stage 3: Run STOI. stoi: mean score is: 0.9530965262477374
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 6.525108717315516
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7149558
Stage 3: Run STOI. stoi: mean score is: 0.8977717650359602
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 6.794794006624397
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.58160305
Stage 3: Run STOI. stoi: mean score is: 0.8815944944789442
Average SDR for speech datasets: 7.049762131898202 Average Mel_Loss for speech datasets: 0.7170057350000001 Average STOI for speech datasets: 0.8962125257989983 Average PESQ for speech datasets: 2.432418486326933 Average SDR for audio datasets: 0.7889543627671705 Average Mel_Loss for audio datasets: 1.58137665
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.49%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 1.53%
Stage 3: Run automatic speech recognition. WER: 3.19%
Stage 4: Run audio event classification. ACC: 86.55%
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 11.9198190006243
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.46257296
Stage 3: Run STOI. stoi: mean score is: 0.9120819982808108
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: 6.241463241745936
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 13.733762141747928
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.62197036
Stage 3: Run STOI. stoi: mean score is: 0.9634806161341553
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 6.650526363401869
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 8.656592212439394
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 10.175153523690033
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6136928
Stage 3: Run STOI. stoi: mean score is: 0.9652810154755299
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 12.392481496173902
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.61285776
Stage 3: Run STOI. stoi: mean score is: 0.9659764076205769
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 14.22380206490447
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5592373
Stage 3: Run STOI. stoi: mean score is: 0.9433491989857918
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 13.537287795228872
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.66826826
Stage 3: Run STOI. stoi: mean score is: 0.9756619198675819
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 11.822250275546512
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.59757036
Stage 3: Run STOI. stoi: mean score is: 0.955504206240366
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 12.644553808875898
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.50202703
Stage 3: Run STOI. stoi: mean score is: 0.9474133160648855
Average SDR for speech datasets: 12.55613876334899 Average Mel_Loss for speech datasets: 0.57977460375 Average STOI for speech datasets: 0.9535935848337121 Average PESQ for speech datasets: 3.527146946936846 Average SDR for audio datasets: 7.182860605862399 Average Mel_Loss for audio datasets: 0.84308288
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.28%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 1.49%
Stage 3: Run automatic speech recognition. WER: 3.07%
Stage 4: Run audio event classification. ACC: 88.00%
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 12.2636534888401
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.4645789
Stage 3: Run STOI. stoi: mean score is: 0.9201668776856671
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: 6.726355181016816
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 14.124681010234116
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5964956
Stage 3: Run STOI. stoi: mean score is: 0.9658302396521976
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 6.954926362898063
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 8.296033518758794
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 10.664635680971012
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5921542
Stage 3: Run STOI. stoi: mean score is: 0.9683333864449756
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 12.879912781761652
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5930597
Stage 3: Run STOI. stoi: mean score is: 0.9687304311248394
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 14.652514660452471
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.54244286
Stage 3: Run STOI. stoi: mean score is: 0.9472981762704458
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 13.91570370530584
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6493998
Stage 3: Run STOI. stoi: mean score is: 0.97752452279595
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 12.273928539620078
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5752002
Stage 3: Run STOI. stoi: mean score is: 0.9589951570618435
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 13.074327995615095
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.4956447
Stage 3: Run STOI. stoi: mean score is: 0.9516514417002608
Average SDR for speech datasets: 12.981169732850043 Average Mel_Loss for speech datasets: 0.563621995 Average STOI for speech datasets: 0.9573162790920224 Average PESQ for speech datasets: 3.5766812434792516 Average SDR for audio datasets: 7.32577168755789 Average Mel_Loss for audio datasets: 0.8917567200000001
for the 16kHz Codec model: the bitrate is 2kbps; for the 44.1kHz Codec model: the bitrate is 6.89kbps; for the 48kHz Codec model: the bitrate is 7.5kbps;
1、Here is the exps/results.txt
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.97%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 2.57%
Stage 3: Run automatic speech recognition. WER: 3.67%
Stage 4: Run audio event classification. ACC: 86.80%
2、Here is the src/codec_metrics/exps/results.txt
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 12.264864005831004
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.46461612
Stage 3: Run STOI. stoi: mean score is: 0.9201546369667847
Stage 4: Run PESQ. pesq: mean score is: 2.9032970213890077
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: 6.726699210213638
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.89280885
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 8.476522537066758
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.75807977
Stage 3: Run STOI. stoi: mean score is: 0.9238519743607232
Stage 4: Run PESQ. pesq: mean score is: 2.8522612583637237
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 6.95385805941422
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8306656
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 8.291245593533532
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.95218104
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 4.233350120341239
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7518116
Stage 3: Run STOI. stoi: mean score is: 0.9050623419177468
Stage 4: Run PESQ. pesq: mean score is: 2.0071350967884065
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 7.751003745240329
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.72347593
Stage 3: Run STOI. stoi: mean score is: 0.9340773701364049
Stage 4: Run PESQ. pesq: mean score is: 2.903846046924591
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 8.4340708735918
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8294336
Stage 3: Run STOI. stoi: mean score is: 0.8863192140533341
Stage 4: Run PESQ. pesq: mean score is: 2.6509935235977173
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 9.542545404819807
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7959907
Stage 3: Run STOI. stoi: mean score is: 0.9531058100873113
Stage 4: Run PESQ. pesq: mean score is: 2.7776152551174165
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 6.524681732109078
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.71494424
Stage 3: Run STOI. stoi: mean score is: 0.8977601804462474
Stage 4: Run PESQ. pesq: mean score is: 2.5823002088069917
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 13.074802660696786
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.49565125
Stage 3: Run STOI. stoi: mean score is: 0.9516724002511663
Stage 4: Run PESQ. pesq: mean score is: 2.9390562558174134
Average SDR for speech datasets: 8.7877301349621 Average Mel_Loss for speech datasets: 0.69175040125 Average STOI for speech datasets: 0.9215004910274648 Average PESQ for speech datasets: 2.7020630833506587 Average SDR for audio datasets: 7.323934287720463 Average Mel_Loss for audio datasets: 0.8918851633333333