voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark
https://codecsuperb.com
183 stars 20 forks source link

Results for SemantiCodec #38

Open yyua8222 opened 2 weeks ago

yyua8222 commented 2 weeks ago

Here is the result for SemantiCodec This is a 16Khz codec with three different bit rates:

  1. For token rate 100 with book size 16384 the bit rate is 1.35 kbps
  2. For token rate 100 with book size 32768 the bit rate is 1.40 kbps
  3. For token rate 50 with book size 16384 the bit rate is 0.68 kbps
  4. For token rate 50 with book size 32768 the bit rate is 0.70 kbps
  5. For token rate 25 with book size 16384 the bit rate is 0.34 kbps
  6. For token rate 25 with book size 32768 the bit rate is 0.35 kbps

The inference code and checkpoint model can be found here

The results of the system under six different configurations are displayed as follow (one comment per system):

yyua8222 commented 2 weeks ago

Results for model with 100 token rate and 16384 code book size:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 71.39%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 3.81%

Stage 3: Run automatic speech recognition. WER: 5.55%

Stage 4: Run audio event classification. ACC: 83.60%


Log results File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -8.023059848347962

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.71579695

Stage 3: Run STOI. stoi: mean score is: 0.6374974081666491

Stage 4: Run PESQ. pesq: mean score is: 1.3225452315807342

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -16.204584799806007

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6063063

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: -3.678850278531351

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8800615

Stage 3: Run STOI. stoi: mean score is: 0.8390240078687938

Stage 4: Run PESQ. pesq: mean score is: 2.0443784379959107

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -15.573628896021797

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.5704794

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -10.929932869636273

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3241482

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -10.0523148424559

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8997036

Stage 3: Run STOI. stoi: mean score is: 0.8000545556153663

Stage 4: Run PESQ. pesq: mean score is: 1.4450754988193513

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: -7.4687414751106225

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8380325

Stage 3: Run STOI. stoi: mean score is: 0.8672585483834184

Stage 4: Run PESQ. pesq: mean score is: 2.0104604637622834

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -9.139100164017448

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.85657454

Stage 3: Run STOI. stoi: mean score is: 0.8004369960794232

Stage 4: Run PESQ. pesq: mean score is: 1.8498523151874542

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: -6.784165470251713

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9755262

Stage 3: Run STOI. stoi: mean score is: 0.8754722747405146

Stage 4: Run PESQ. pesq: mean score is: 1.8099392879009246

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -9.873407105853522

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8101869

Stage 3: Run STOI. stoi: mean score is: 0.811518312954677

Stage 4: Run PESQ. pesq: mean score is: 1.786508893966675

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -13.585821389136129

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78308046

Stage 3: Run STOI. stoi: mean score is: 0.7916500742300961

Stage 4: Run PESQ. pesq: mean score is: 1.462774316072464

Average SDR for speech datasets: -8.575682571713081 Average Mel_Loss for speech datasets: 0.8448703312499999 Average STOI for speech datasets: 0.8028640222548673 Average PESQ for speech datasets: 1.7164418056607245 Average SDR for audio datasets: -14.236048855154692 Average Mel_Loss for audio datasets: 1.5003113

yyua8222 commented 2 weeks ago

Results for model with 100 token rate and 32768 code book size:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 71.04%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 3.64%

Stage 3: Run automatic speech recognition. WER: 5.50%

Stage 4: Run audio event classification. ACC: 83.15%


Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -8.288299352593407

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7141452

Stage 3: Run STOI. stoi: mean score is: 0.6402874449523498

Stage 4: Run PESQ. pesq: mean score is: 1.3165868592262269

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -16.00277567356359

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6065166

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: -3.9123262783170674

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8796803

Stage 3: Run STOI. stoi: mean score is: 0.8415218683353153

Stage 4: Run PESQ. pesq: mean score is: 2.062159482240677

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -16.190419273403485

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.5684569

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -10.130163797288604

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3271292

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -9.806158885886454

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.89801973

Stage 3: Run STOI. stoi: mean score is: 0.8023610604658767

Stage 4: Run PESQ. pesq: mean score is: 1.4408800554275514

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: -7.465939778921175

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83398134

Stage 3: Run STOI. stoi: mean score is: 0.8680992262187252

Stage 4: Run PESQ. pesq: mean score is: 2.0172382056713105

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -9.4248413812485

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8552469

Stage 3: Run STOI. stoi: mean score is: 0.8009020639528738

Stage 4: Run PESQ. pesq: mean score is: 1.874754753112793

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: -6.770595884905695

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9717746

Stage 3: Run STOI. stoi: mean score is: 0.8772398321019043

Stage 4: Run PESQ. pesq: mean score is: 1.8369818699359894

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -9.957701026949303

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.80674434

Stage 3: Run STOI. stoi: mean score is: 0.8145108847377486

Stage 4: Run PESQ. pesq: mean score is: 1.8088320195674896

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -13.324050827908918

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78012276

Stage 3: Run STOI. stoi: mean score is: 0.7933999433518525

Stage 4: Run PESQ. pesq: mean score is: 1.4673356866836549

Average SDR for speech datasets: -8.618739177091316 Average Mel_Loss for speech datasets: 0.84246439625 Average STOI for speech datasets: 0.8047902905145807 Average PESQ for speech datasets: 1.7280961164832116 Average SDR for audio datasets: -14.107786248085226 Average Mel_Loss for audio datasets: 1.5007009

yyua8222 commented 2 weeks ago

Results for model with 50 token rate and 16384 code book size:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 68.12%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 6.16%

Stage 3: Run automatic speech recognition. WER: 9.55%

Stage 4: Run audio event classification. ACC: 76.55%


Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -8.83968419510651

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7127933

Stage 3: Run STOI. stoi: mean score is: 0.59937756747475

Stage 4: Run PESQ. pesq: mean score is: 1.2897077596187592

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -16.699295371807537

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.629877

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: -4.29523195078702

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.96175253

Stage 3: Run STOI. stoi: mean score is: 0.8026151794296594

Stage 4: Run PESQ. pesq: mean score is: 1.801913343667984

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -16.5305423514448

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6025631

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -10.579743797921056

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3357253

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -10.66528635465503

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0411216

Stage 3: Run STOI. stoi: mean score is: 0.7410676812363071

Stage 4: Run PESQ. pesq: mean score is: 1.2746098387241362

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: -8.113302633684958

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9311916

Stage 3: Run STOI. stoi: mean score is: 0.8395857457648703

Stage 4: Run PESQ. pesq: mean score is: 1.7791949903964996

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -9.662703719793258

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9265094

Stage 3: Run STOI. stoi: mean score is: 0.7611285319217221

Stage 4: Run PESQ. pesq: mean score is: 1.6827945744991302

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: -7.3375089676368646

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0891488

Stage 3: Run STOI. stoi: mean score is: 0.8422847505824207

Stage 4: Run PESQ. pesq: mean score is: 1.572229918241501

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -10.419868769887758

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9091263

Stage 3: Run STOI. stoi: mean score is: 0.7786395411501819

Stage 4: Run PESQ. pesq: mean score is: 1.6212511384487152

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -14.210414162406268

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83399665

Stage 3: Run STOI. stoi: mean score is: 0.7465415983803311

Stage 4: Run PESQ. pesq: mean score is: 1.4144192659854888

Average SDR for speech datasets: -9.193000094244708 Average Mel_Loss for speech datasets: 0.9257050225000001 Average STOI for speech datasets: 0.7639050744925302 Average PESQ for speech datasets: 1.5545151036977767 Average SDR for audio datasets: -14.60319384039113 Average Mel_Loss for audio datasets: 1.5227218

yyua8222 commented 2 weeks ago

Results for model with 50 token rate and 32768 code book size:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 67.15%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 6.01%

Stage 3: Run automatic speech recognition. WER: 9.69%

Stage 4: Run audio event classification. ACC: 75.10%


Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -8.770160568168329

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7157427

Stage 3: Run STOI. stoi: mean score is: 0.5990633321199552

Stage 4: Run PESQ. pesq: mean score is: 1.2988323020935058

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -16.759995199904903

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6301608

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: -4.439544938378307

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9583634

Stage 3: Run STOI. stoi: mean score is: 0.8065342834968997

Stage 4: Run PESQ. pesq: mean score is: 1.8093781626224519

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -16.590133601793124

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.600283

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -10.233150558590781

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3511304

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -10.776933275608268

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0376164

Stage 3: Run STOI. stoi: mean score is: 0.7419712845721602

Stage 4: Run PESQ. pesq: mean score is: 1.2745465958118438

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: -7.896944603174362

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9255314

Stage 3: Run STOI. stoi: mean score is: 0.8416043077360352

Stage 4: Run PESQ. pesq: mean score is: 1.7907265722751617

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -9.604782428385983

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9269376

Stage 3: Run STOI. stoi: mean score is: 0.7635182742921145

Stage 4: Run PESQ. pesq: mean score is: 1.6788008534908294

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: -7.306414974996127

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0834035

Stage 3: Run STOI. stoi: mean score is: 0.8455957873829031

Stage 4: Run PESQ. pesq: mean score is: 1.596457360982895

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -10.43514078996363

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.90818316

Stage 3: Run STOI. stoi: mean score is: 0.7796653041584611

Stage 4: Run PESQ. pesq: mean score is: 1.6307682001590729

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -14.158362698563757

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83324564

Stage 3: Run STOI. stoi: mean score is: 0.7478858007055951

Stage 4: Run PESQ. pesq: mean score is: 1.415548061132431

Average SDR for speech datasets: -9.173535534654846 Average Mel_Loss for speech datasets: 0.923627975 Average STOI for speech datasets: 0.7657297968080157 Average PESQ for speech datasets: 1.5618822635710237 Average SDR for audio datasets: -14.527759786762935 Average Mel_Loss for audio datasets: 1.5271913999999998

yyua8222 commented 2 weeks ago

Results for model with 25 token rate and 16384 code book size:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 61.53%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 13.70%

Stage 3: Run automatic speech recognition. WER: 35.79%

Stage 4: Run audio event classification. ACC: 71.55%


Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -9.891073254225994

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.79265803

Stage 3: Run STOI. stoi: mean score is: 0.5382069630214918

Stage 4: Run PESQ. pesq: mean score is: 1.2317941224575042

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -17.354609349344106

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6950777

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: -5.118099710803417

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1607062

Stage 3: Run STOI. stoi: mean score is: 0.7279729071609607

Stage 4: Run PESQ. pesq: mean score is: 1.470268008708954

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -17.525922260145695

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6639311

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -9.84819729776821

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4082423

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -11.828557564473659

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3157852

Stage 3: Run STOI. stoi: mean score is: 0.6398609276418542

Stage 4: Run PESQ. pesq: mean score is: 1.1277076315879822

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: -9.074854594346156

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.143972

Stage 3: Run STOI. stoi: mean score is: 0.7747987615118724

Stage 4: Run PESQ. pesq: mean score is: 1.426479343175888

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -10.47760850527248

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1010196

Stage 3: Run STOI. stoi: mean score is: 0.6862266259635116

Stage 4: Run PESQ.

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: -8.102805757598823

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3454256

Stage 3: Run STOI. stoi: mean score is: 0.783804268388349

Stage 4: Run PESQ. pesq: mean score is: 1.3105683100223542

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -11.169038464688196

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0954686

Stage 3: Run STOI. stoi: mean score is: 0.7094035546469811

Stage 4: Run PESQ. pesq: mean score is: 1.3510719525814057

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -15.680900866701082

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9287965

Stage 3: Run STOI. stoi: mean score is: 0.6631148883616201

Stage 4: Run PESQ. pesq: mean score is: 1.2879982483386994

Average SDR for speech datasets: -10.167867339763726 Average Mel_Loss for speech datasets: 1.1104789662499999 Average STOI for speech datasets: 0.69042361208708 Average PESQ for speech datasets: 1.3259033580124377 Average SDR for audio datasets: -14.909576302419337 Average Mel_Loss for audio datasets: 1.5890836999999998

yyua8222 commented 2 weeks ago

Results for model with 25 token rate and 32768 code book size:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 59.51%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 13.39%

Stage 3: Run automatic speech recognition. WER: 34.24%

Stage 4: Run audio event classification. ACC: 70.45%


Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -9.52817490628773

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78014153

Stage 3: Run STOI. stoi: mean score is: 0.536566776256902

Stage 4: Run PESQ.

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -18.045539644348803

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6942394

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: -4.756434837791447

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1558565

Stage 3: Run STOI. stoi: mean score is: 0.7376097582470694

Stage 4: Run PESQ. pesq: mean score is: 1.4803874719142913

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -17.28732169023466

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6601683

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -9.839931109752126

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4160614

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -11.686392159090719

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3149458

Stage 3: Run STOI. stoi: mean score is: 0.6450955925787938

Stage 4: Run PESQ. pesq: mean score is: 1.1226227939128877

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: -9.023869144962699

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1400143

Stage 3: Run STOI. stoi: mean score is: 0.778975415690721

Stage 4: Run PESQ. pesq: mean score is: 1.4233695840835572

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -10.446293708828193

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0919785

Stage 3: Run STOI. stoi: mean score is: 0.6912703894668684

Stage 4: Run PESQ. pesq: mean score is: 1.4184428441524506

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: -7.820809908089303

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.343809

Stage 3: Run STOI. stoi: mean score is: 0.7835718970167425

Stage 4: Run PESQ. pesq: mean score is: 1.3171902728080749

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -11.3429282056549

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0941978

Stage 3: Run STOI. stoi: mean score is: 0.71035581129116

Stage 4: Run PESQ. pesq: mean score is: 1.3429110085964202

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -15.616014513375687

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9306869

Stage 3: Run STOI. stoi: mean score is: 0.6662986378594428

Stage 4: Run PESQ. pesq: mean score is: 1.2928564262390136

Average SDR for speech datasets: -10.027614673010085 Average Mel_Loss for speech datasets: 1.1064537912499999 Average STOI for speech datasets: 0.6937180348009626 Average PESQ for speech datasets: 1.3297000639140608 Average SDR for audio datasets: -15.057597481445194 Average Mel_Loss for audio datasets: 1.590156366666667