Open yyua8222 opened 2 weeks ago
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 71.39%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 3.81%
Stage 3: Run automatic speech recognition. WER: 5.55%
Stage 4: Run audio event classification. ACC: 83.60%
Log results File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.023059848347962
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.71579695
Stage 3: Run STOI. stoi: mean score is: 0.6374974081666491
Stage 4: Run PESQ. pesq: mean score is: 1.3225452315807342
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.204584799806007
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6063063
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -3.678850278531351
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8800615
Stage 3: Run STOI. stoi: mean score is: 0.8390240078687938
Stage 4: Run PESQ. pesq: mean score is: 2.0443784379959107
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -15.573628896021797
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.5704794
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.929932869636273
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3241482
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -10.0523148424559
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8997036
Stage 3: Run STOI. stoi: mean score is: 0.8000545556153663
Stage 4: Run PESQ. pesq: mean score is: 1.4450754988193513
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -7.4687414751106225
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8380325
Stage 3: Run STOI. stoi: mean score is: 0.8672585483834184
Stage 4: Run PESQ. pesq: mean score is: 2.0104604637622834
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.139100164017448
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.85657454
Stage 3: Run STOI. stoi: mean score is: 0.8004369960794232
Stage 4: Run PESQ. pesq: mean score is: 1.8498523151874542
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -6.784165470251713
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9755262
Stage 3: Run STOI. stoi: mean score is: 0.8754722747405146
Stage 4: Run PESQ. pesq: mean score is: 1.8099392879009246
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -9.873407105853522
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8101869
Stage 3: Run STOI. stoi: mean score is: 0.811518312954677
Stage 4: Run PESQ. pesq: mean score is: 1.786508893966675
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -13.585821389136129
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78308046
Stage 3: Run STOI. stoi: mean score is: 0.7916500742300961
Stage 4: Run PESQ. pesq: mean score is: 1.462774316072464
Average SDR for speech datasets: -8.575682571713081 Average Mel_Loss for speech datasets: 0.8448703312499999 Average STOI for speech datasets: 0.8028640222548673 Average PESQ for speech datasets: 1.7164418056607245 Average SDR for audio datasets: -14.236048855154692 Average Mel_Loss for audio datasets: 1.5003113
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 71.04%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 3.64%
Stage 3: Run automatic speech recognition. WER: 5.50%
Stage 4: Run audio event classification. ACC: 83.15%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.288299352593407
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7141452
Stage 3: Run STOI. stoi: mean score is: 0.6402874449523498
Stage 4: Run PESQ. pesq: mean score is: 1.3165868592262269
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.00277567356359
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6065166
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -3.9123262783170674
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8796803
Stage 3: Run STOI. stoi: mean score is: 0.8415218683353153
Stage 4: Run PESQ. pesq: mean score is: 2.062159482240677
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -16.190419273403485
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.5684569
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.130163797288604
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3271292
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -9.806158885886454
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.89801973
Stage 3: Run STOI. stoi: mean score is: 0.8023610604658767
Stage 4: Run PESQ. pesq: mean score is: 1.4408800554275514
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -7.465939778921175
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83398134
Stage 3: Run STOI. stoi: mean score is: 0.8680992262187252
Stage 4: Run PESQ. pesq: mean score is: 2.0172382056713105
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.4248413812485
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8552469
Stage 3: Run STOI. stoi: mean score is: 0.8009020639528738
Stage 4: Run PESQ. pesq: mean score is: 1.874754753112793
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -6.770595884905695
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9717746
Stage 3: Run STOI. stoi: mean score is: 0.8772398321019043
Stage 4: Run PESQ. pesq: mean score is: 1.8369818699359894
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -9.957701026949303
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.80674434
Stage 3: Run STOI. stoi: mean score is: 0.8145108847377486
Stage 4: Run PESQ. pesq: mean score is: 1.8088320195674896
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -13.324050827908918
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78012276
Stage 3: Run STOI. stoi: mean score is: 0.7933999433518525
Stage 4: Run PESQ. pesq: mean score is: 1.4673356866836549
Average SDR for speech datasets: -8.618739177091316 Average Mel_Loss for speech datasets: 0.84246439625 Average STOI for speech datasets: 0.8047902905145807 Average PESQ for speech datasets: 1.7280961164832116 Average SDR for audio datasets: -14.107786248085226 Average Mel_Loss for audio datasets: 1.5007009
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 68.12%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 6.16%
Stage 3: Run automatic speech recognition. WER: 9.55%
Stage 4: Run audio event classification. ACC: 76.55%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.83968419510651
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7127933
Stage 3: Run STOI. stoi: mean score is: 0.59937756747475
Stage 4: Run PESQ. pesq: mean score is: 1.2897077596187592
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.699295371807537
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.629877
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -4.29523195078702
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.96175253
Stage 3: Run STOI. stoi: mean score is: 0.8026151794296594
Stage 4: Run PESQ. pesq: mean score is: 1.801913343667984
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -16.5305423514448
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6025631
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.579743797921056
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3357253
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -10.66528635465503
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0411216
Stage 3: Run STOI. stoi: mean score is: 0.7410676812363071
Stage 4: Run PESQ. pesq: mean score is: 1.2746098387241362
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -8.113302633684958
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9311916
Stage 3: Run STOI. stoi: mean score is: 0.8395857457648703
Stage 4: Run PESQ. pesq: mean score is: 1.7791949903964996
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.662703719793258
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9265094
Stage 3: Run STOI. stoi: mean score is: 0.7611285319217221
Stage 4: Run PESQ. pesq: mean score is: 1.6827945744991302
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -7.3375089676368646
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0891488
Stage 3: Run STOI. stoi: mean score is: 0.8422847505824207
Stage 4: Run PESQ. pesq: mean score is: 1.572229918241501
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -10.419868769887758
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9091263
Stage 3: Run STOI. stoi: mean score is: 0.7786395411501819
Stage 4: Run PESQ. pesq: mean score is: 1.6212511384487152
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -14.210414162406268
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83399665
Stage 3: Run STOI. stoi: mean score is: 0.7465415983803311
Stage 4: Run PESQ. pesq: mean score is: 1.4144192659854888
Average SDR for speech datasets: -9.193000094244708 Average Mel_Loss for speech datasets: 0.9257050225000001 Average STOI for speech datasets: 0.7639050744925302 Average PESQ for speech datasets: 1.5545151036977767 Average SDR for audio datasets: -14.60319384039113 Average Mel_Loss for audio datasets: 1.5227218
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 67.15%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 6.01%
Stage 3: Run automatic speech recognition. WER: 9.69%
Stage 4: Run audio event classification. ACC: 75.10%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.770160568168329
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7157427
Stage 3: Run STOI. stoi: mean score is: 0.5990633321199552
Stage 4: Run PESQ. pesq: mean score is: 1.2988323020935058
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.759995199904903
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6301608
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -4.439544938378307
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9583634
Stage 3: Run STOI. stoi: mean score is: 0.8065342834968997
Stage 4: Run PESQ. pesq: mean score is: 1.8093781626224519
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -16.590133601793124
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.600283
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.233150558590781
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3511304
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -10.776933275608268
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0376164
Stage 3: Run STOI. stoi: mean score is: 0.7419712845721602
Stage 4: Run PESQ. pesq: mean score is: 1.2745465958118438
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -7.896944603174362
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9255314
Stage 3: Run STOI. stoi: mean score is: 0.8416043077360352
Stage 4: Run PESQ. pesq: mean score is: 1.7907265722751617
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.604782428385983
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9269376
Stage 3: Run STOI. stoi: mean score is: 0.7635182742921145
Stage 4: Run PESQ. pesq: mean score is: 1.6788008534908294
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -7.306414974996127
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0834035
Stage 3: Run STOI. stoi: mean score is: 0.8455957873829031
Stage 4: Run PESQ. pesq: mean score is: 1.596457360982895
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -10.43514078996363
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.90818316
Stage 3: Run STOI. stoi: mean score is: 0.7796653041584611
Stage 4: Run PESQ. pesq: mean score is: 1.6307682001590729
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -14.158362698563757
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83324564
Stage 3: Run STOI. stoi: mean score is: 0.7478858007055951
Stage 4: Run PESQ. pesq: mean score is: 1.415548061132431
Average SDR for speech datasets: -9.173535534654846 Average Mel_Loss for speech datasets: 0.923627975 Average STOI for speech datasets: 0.7657297968080157 Average PESQ for speech datasets: 1.5618822635710237 Average SDR for audio datasets: -14.527759786762935 Average Mel_Loss for audio datasets: 1.5271913999999998
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 61.53%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 13.70%
Stage 3: Run automatic speech recognition. WER: 35.79%
Stage 4: Run audio event classification. ACC: 71.55%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -9.891073254225994
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.79265803
Stage 3: Run STOI. stoi: mean score is: 0.5382069630214918
Stage 4: Run PESQ. pesq: mean score is: 1.2317941224575042
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -17.354609349344106
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6950777
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -5.118099710803417
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1607062
Stage 3: Run STOI. stoi: mean score is: 0.7279729071609607
Stage 4: Run PESQ. pesq: mean score is: 1.470268008708954
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -17.525922260145695
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6639311
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -9.84819729776821
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4082423
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -11.828557564473659
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3157852
Stage 3: Run STOI. stoi: mean score is: 0.6398609276418542
Stage 4: Run PESQ. pesq: mean score is: 1.1277076315879822
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -9.074854594346156
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.143972
Stage 3: Run STOI. stoi: mean score is: 0.7747987615118724
Stage 4: Run PESQ. pesq: mean score is: 1.426479343175888
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -10.47760850527248
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1010196
Stage 3: Run STOI. stoi: mean score is: 0.6862266259635116
Stage 4: Run PESQ.
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -8.102805757598823
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3454256
Stage 3: Run STOI. stoi: mean score is: 0.783804268388349
Stage 4: Run PESQ. pesq: mean score is: 1.3105683100223542
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -11.169038464688196
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0954686
Stage 3: Run STOI. stoi: mean score is: 0.7094035546469811
Stage 4: Run PESQ. pesq: mean score is: 1.3510719525814057
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -15.680900866701082
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9287965
Stage 3: Run STOI. stoi: mean score is: 0.6631148883616201
Stage 4: Run PESQ. pesq: mean score is: 1.2879982483386994
Average SDR for speech datasets: -10.167867339763726 Average Mel_Loss for speech datasets: 1.1104789662499999 Average STOI for speech datasets: 0.69042361208708 Average PESQ for speech datasets: 1.3259033580124377 Average SDR for audio datasets: -14.909576302419337 Average Mel_Loss for audio datasets: 1.5890836999999998
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 59.51%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 13.39%
Stage 3: Run automatic speech recognition. WER: 34.24%
Stage 4: Run audio event classification. ACC: 70.45%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -9.52817490628773
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78014153
Stage 3: Run STOI. stoi: mean score is: 0.536566776256902
Stage 4: Run PESQ.
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -18.045539644348803
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6942394
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -4.756434837791447
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1558565
Stage 3: Run STOI. stoi: mean score is: 0.7376097582470694
Stage 4: Run PESQ. pesq: mean score is: 1.4803874719142913
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -17.28732169023466
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6601683
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -9.839931109752126
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4160614
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -11.686392159090719
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3149458
Stage 3: Run STOI. stoi: mean score is: 0.6450955925787938
Stage 4: Run PESQ. pesq: mean score is: 1.1226227939128877
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -9.023869144962699
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1400143
Stage 3: Run STOI. stoi: mean score is: 0.778975415690721
Stage 4: Run PESQ. pesq: mean score is: 1.4233695840835572
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -10.446293708828193
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0919785
Stage 3: Run STOI. stoi: mean score is: 0.6912703894668684
Stage 4: Run PESQ. pesq: mean score is: 1.4184428441524506
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -7.820809908089303
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.343809
Stage 3: Run STOI. stoi: mean score is: 0.7835718970167425
Stage 4: Run PESQ. pesq: mean score is: 1.3171902728080749
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -11.3429282056549
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0941978
Stage 3: Run STOI. stoi: mean score is: 0.71035581129116
Stage 4: Run PESQ. pesq: mean score is: 1.3429110085964202
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -15.616014513375687
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9306869
Stage 3: Run STOI. stoi: mean score is: 0.6662986378594428
Stage 4: Run PESQ. pesq: mean score is: 1.2928564262390136
Average SDR for speech datasets: -10.027614673010085 Average Mel_Loss for speech datasets: 1.1064537912499999 Average STOI for speech datasets: 0.6937180348009626 Average PESQ for speech datasets: 1.3297000639140608 Average SDR for audio datasets: -15.057597481445194 Average Mel_Loss for audio datasets: 1.590156366666667
Here is the result for SemantiCodec This is a 16Khz codec with three different bit rates:
The inference code and checkpoint model can be found here
The results of the system under six different configurations are displayed as follow (one comment per system):