speechbrain / benchmarks

This repository contains the SpeechBrain Benchmarks
Apache License 2.0
83 stars 35 forks source link

added emotion and asr #11

Closed salah-zaiem closed 7 months ago

mravanelli commented 9 months ago

Thank you @salah-zaiem for your valuable contribution! I did a first code inspection and I have a few comments:

  1. SpeechBrain 1.0 Compliance: Before extending the code to other tasks and models, I suggest making it compliant with SpeechBrain 1.0. A LibriSpeech CTC recipe compliant with SpeechBrain 1.0 can be found here. The conversion is relatively easy, and both I and @Adel-Moumen are available to assist if needed. The Changes include:

    • speechbrain.lobes.augment.TimeDomainSpecAugment does not exist (refer to the linked example for the new augment).
    • Consider removing the option for from pyctcdecode import build_ctcdecoder since we now have our own CTC beamsearcher, especially if LM is not in use.

    Please, test the code with the latest unstable-v0.6 in the SpeechBrain repository (soon to be merged into dev)

  2. What is the purpose of ssl_train.py and ssl.yaml?

  3. I would rename discrete_train.py to train.py

  4. in benchmarks/DASB/LibriSpeech/hparams/encodec_12.yaml, I propose to eliminate the manual definition of csv_folder. (Similar to the MP3S benchmark, consider storing the csv file in !ref <output_folder> for consistency)

  5. It appears that the data preparation script is missing for both datasets.

  6. After addressing the above points and cleaning up the code, consider adding the two probing heads as implemented in MP3S.