Closed salah-zaiem closed 1 year ago
Thank you for this PR, @salah-zaiem! You can find my comments below.
python train.py hparams.yaml
. Additionally, explain how to modify the SSL model by changing ssl_hub
and guide users on assessing their own (local) SSL models. Finally, include instructions on running multiple experiments with run_benchmark.sh
and provide runnable code snippets.Moreover, a few additional comments:
develop
branch, for the weighted SSL model.!PLACEHOLDER
in the YAML files), I am curious how you handle that in run_benchmark.sh
. I would have expected users to specify the data folder for each dataset.python train.py hparams/ssl.yaml --data_folder=$SLURM_TMPDIR/librispeech
I encountered the following error:
ValueError: 'ngram_lm_path' is a !PLACEHOLDER and must be replaced.
It also appears that ngram_lm_path
is not used. If it is required in other recipes, I believe we should automatically download it from the Speechbrain Dropbox during data preparation.
Apart from documentation, I believe we should put effort into making the benchmark as user-friendly as possible. Our goal is for users to be able to run it without encountering too many issues. @TParcollet, any idea or comment?
thank you @salah-zaiem for addressing my comments! I made some minor modifications, mainly on the readme file. have a couple of final suggestions:
We need to assign a meaningful name to the benchmark. This is crucial. As a suggestion, I propose the name "MPSS" (Multi-Probe Self-Supervised Benchmark). You might have better ideas. Please take some time to think about it.
While testing the LSTM recipe on Librispeech, I encountered a warning:
/scratch/ravanelm/myenv1/lib/python3.10/site-packages/transformers/configuration_utils.py:380: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments
.
`
I would recommend investigating this warning further. It seems that all the recipes might be affected by this issue in future versions of transformers.
thank you @salah-zaiem! Everything looks good to me!
Added here the SSL Benchmark and the tests, another PR is needed on the speechbrain library to add the WeightedSSLModel class.