voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark
https://codecsuperb.com
183 stars 20 forks source link

results #37

Open huazhi1024 opened 2 weeks ago

huazhi1024 commented 2 weeks ago

for the 16kHz Codec model: the bitrate is 2kbps; for the 44.1kHz Codec model: the bitrate is 6.89kbps; for the 48kHz Codec model: the bitrate is 7.5kbps;

1、Here is the exps/results.txt

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 75.97%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 2.57%

Stage 3: Run automatic speech recognition. WER: 3.67%

Stage 4: Run audio event classification. ACC: 86.80%

2、Here is the src/codec_metrics/exps/results.txt

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: 12.264864005831004

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.46461612

Stage 3: Run STOI. stoi: mean score is: 0.9201546369667847

Stage 4: Run PESQ. pesq: mean score is: 2.9032970213890077

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: 6.726699210213638

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.89280885

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 8.476522537066758

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.75807977

Stage 3: Run STOI. stoi: mean score is: 0.9238519743607232

Stage 4: Run PESQ. pesq: mean score is: 2.8522612583637237

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: 6.95385805941422

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8306656

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: 8.291245593533532

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.95218104

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: 4.233350120341239

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7518116

Stage 3: Run STOI. stoi: mean score is: 0.9050623419177468

Stage 4: Run PESQ. pesq: mean score is: 2.0071350967884065

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 7.751003745240329

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.72347593

Stage 3: Run STOI. stoi: mean score is: 0.9340773701364049

Stage 4: Run PESQ. pesq: mean score is: 2.903846046924591

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: 8.4340708735918

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8294336

Stage 3: Run STOI. stoi: mean score is: 0.8863192140533341

Stage 4: Run PESQ. pesq: mean score is: 2.6509935235977173

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 9.542545404819807

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7959907

Stage 3: Run STOI. stoi: mean score is: 0.9531058100873113

Stage 4: Run PESQ. pesq: mean score is: 2.7776152551174165

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: 6.524681732109078

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.71494424

Stage 3: Run STOI. stoi: mean score is: 0.8977601804462474

Stage 4: Run PESQ. pesq: mean score is: 2.5823002088069917

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: 13.074802660696786

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.49565125

Stage 3: Run STOI. stoi: mean score is: 0.9516724002511663

Stage 4: Run PESQ. pesq: mean score is: 2.9390562558174134

Average SDR for speech datasets: 8.7877301349621 Average Mel_Loss for speech datasets: 0.69175040125 Average STOI for speech datasets: 0.9215004910274648 Average PESQ for speech datasets: 2.7020630833506587 Average SDR for audio datasets: 7.323934287720463 Average Mel_Loss for audio datasets: 0.8918851633333333

hbwu-ntu commented 2 weeks ago

Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?

huazhi1024 commented 2 weeks ago

Yes, I will finish it by Monday. However, I am currently encountering some issues with uploading the model to GitHub.

---Original--- From: @.> Date: Sat, Jun 15, 2024 13:18 PM To: @.>; Cc: @.**@.>; Subject: Re: [voidful/Codec-SUPERB] results (Issue #37)

Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

hbwu-ntu commented 2 weeks ago

Here is one suggestion: the codec model ckpt can be uploaded to huggingface or google drive (with an instruction to use gdown to download the model)

huazhi1024 commented 2 weeks ago

Hello,I have completed the model release. The download link and usage instructions have been sent to you via email. If you have any questions, please feel free to contact me. Thank you very much.

---Original--- From: @.> Date: Sat, Jun 15, 2024 14:48 PM To: @.>; Cc: @.**@.>; Subject: Re: [voidful/Codec-SUPERB] results (Issue #37)

Here is one suggestion: the codec model ckpt can be uploaded to huggingface or google drive (with an instruction to use gdown to download the model)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

huazhi1024 commented 1 week ago

updata:

16khz,2kbps codec model

(1) Downstream results:

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 75.97%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 2.57%

Stage 3: Run automatic speech recognition. WER: 3.64%

Stage 4: Run audio event classification. ACC: 71.10%

(2) Signal-level results

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: 4.641087071226074

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.580518

Stage 3: Run STOI. stoi: mean score is: 0.7878352309918871

Stage 4: Run PESQ. pesq: mean score is: 1.7021552300453187

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is:

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.8372705

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 8.476178635258437

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.75820357

Stage 3: Run STOI. stoi: mean score is: 0.923865876017417

Stage 4: Run PESQ. pesq: mean score is: 2.852576096057892

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: 0.4370140327990164

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.756783

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: 1.1408946927353245

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4059703

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: 4.2329371204943

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7518392

Stage 3: Run STOI. stoi: mean score is: 0.9050986518783571

Stage 4: Run PESQ. pesq: mean score is: 2.006877576112747

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 7.752400420683839

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.72355676

Stage 3: Run STOI. stoi: mean score is: 0.9340837095549291

Stage 4: Run PESQ. pesq: mean score is: 2.9040276074409483

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: 8.433560426646096

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8292966

Stage 3: Run STOI. stoi: mean score is: 0.8863539521867545

Stage 4: Run PESQ. pesq: mean score is: 2.650856384038925

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 9.542030656936957

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7960729

Stage 3: Run STOI. stoi: mean score is: 0.9530965262477374

Stage 4: Run PESQ. pesq: mean score is: 2.7770466423034668

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: 6.525108717315516

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7149558

Stage 3: Run STOI. stoi: mean score is: 0.8977717650359602

Stage 4: Run PESQ. pesq: mean score is: 2.582567346096039

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: 6.794794006624397

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.58160305

Stage 3: Run STOI. stoi: mean score is: 0.8815944944789442

Stage 4: Run PESQ. pesq: mean score is: 1.9832410085201264

Average SDR for speech datasets: 7.049762131898202 Average Mel_Loss for speech datasets: 0.7170057350000001 Average STOI for speech datasets: 0.8962125257989983 Average PESQ for speech datasets: 2.432418486326933 Average SDR for audio datasets: 0.7889543627671705 Average Mel_Loss for audio datasets: 1.58137665

huazhi1024 commented 1 week ago

44.1khz,7kbps codec model

(1) Downstream results

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 75.49%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 1.53%

Stage 3: Run automatic speech recognition. WER: 3.19%

Stage 4: Run audio event classification. ACC: 86.55%

(2) Signal-level results

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: 11.9198190006243

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.46257296

Stage 3: Run STOI. stoi: mean score is: 0.9120819982808108

Stage 4: Run PESQ. pesq: mean score is: 2.830995168685913

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: 6.241463241745936

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83121604

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 13.733762141747928

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.62197036

Stage 3: Run STOI. stoi: mean score is: 0.9634806161341553

Stage 4: Run PESQ. pesq: mean score is: 3.8307976722717285

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: 6.650526363401869

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7842263

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: 8.656592212439394

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9138063

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: 10.175153523690033

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6136928

Stage 3: Run STOI. stoi: mean score is: 0.9652810154755299

Stage 4: Run PESQ. pesq: mean score is: 3.5824116134643553

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 12.392481496173902

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.61285776

Stage 3: Run STOI. stoi: mean score is: 0.9659764076205769

Stage 4: Run PESQ. pesq: mean score is: 3.854781861305237

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: 14.22380206490447

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5592373

Stage 3: Run STOI. stoi: mean score is: 0.9433491989857918

Stage 4: Run PESQ. pesq: mean score is: 3.7363220167160036

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 13.537287795228872

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.66826826

Stage 3: Run STOI. stoi: mean score is: 0.9756619198675819

Stage 4: Run PESQ. pesq: mean score is: 3.6874674439430235

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: 11.822250275546512

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.59757036

Stage 3: Run STOI. stoi: mean score is: 0.955504206240366

Stage 4: Run PESQ. pesq: mean score is: 3.8139785027503965

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: 12.644553808875898

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.50202703

Stage 3: Run STOI. stoi: mean score is: 0.9474133160648855

Stage 4: Run PESQ. pesq: mean score is: 2.8804212963581084

Average SDR for speech datasets: 12.55613876334899 Average Mel_Loss for speech datasets: 0.57977460375 Average STOI for speech datasets: 0.9535935848337121 Average PESQ for speech datasets: 3.527146946936846 Average SDR for audio datasets: 7.182860605862399 Average Mel_Loss for audio datasets: 0.84308288

huazhi1024 commented 1 week ago

48kHz,7.5kbps codec model

(1) Downstream results

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 75.28%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 1.49%

Stage 3: Run automatic speech recognition. WER: 3.07%

Stage 4: Run audio event classification. ACC: 88.00%

(2)Signal-level results

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: 12.2636534888401

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.4645789

Stage 3: Run STOI. stoi: mean score is: 0.9201668776856671

Stage 4: Run PESQ. pesq: mean score is: 2.900687514543533

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: 6.726355181016816

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.892827

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 14.124681010234116

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5964956

Stage 3: Run STOI. stoi: mean score is: 0.9658302396521976

Stage 4: Run PESQ. pesq: mean score is: 3.873115861415863

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: 6.954926362898063

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83061826

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: 8.296033518758794

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9518249

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: 10.664635680971012

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5921542

Stage 3: Run STOI. stoi: mean score is: 0.9683333864449756

Stage 4: Run PESQ. pesq: mean score is: 3.6724947714805602

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 12.879912781761652

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5930597

Stage 3: Run STOI. stoi: mean score is: 0.9687304311248394

Stage 4: Run PESQ. pesq: mean score is: 3.869354705810547

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: 14.652514660452471

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.54244286

Stage 3: Run STOI. stoi: mean score is: 0.9472981762704458

Stage 4: Run PESQ. pesq: mean score is: 3.7385361623764037

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 13.91570370530584

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6493998

Stage 3: Run STOI. stoi: mean score is: 0.97752452279595

Stage 4: Run PESQ. pesq: mean score is: 3.7307146120071413

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: 12.273928539620078

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5752002

Stage 3: Run STOI. stoi: mean score is: 0.9589951570618435

Stage 4: Run PESQ. pesq: mean score is: 3.8899018454551695

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: 13.074327995615095

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.4956447

Stage 3: Run STOI. stoi: mean score is: 0.9516514417002608

Stage 4: Run PESQ. pesq: mean score is: 2.938644474744797

Average SDR for speech datasets: 12.981169732850043 Average Mel_Loss for speech datasets: 0.563621995 Average STOI for speech datasets: 0.9573162790920224 Average PESQ for speech datasets: 3.5766812434792516 Average SDR for audio datasets: 7.32577168755789 Average Mel_Loss for audio datasets: 0.8917567200000001