Fine-Tuning ESPnet Models: A Request for Information and Tutorials

reazon-research / ReazonSpeech

Massive open Japanese speech corpus

https://research.reazon.jp/projects/ReazonSpeech/

Apache License 2.0

239 stars 18 forks source link

Fine-Tuning ESPnet Models: A Request for Information and Tutorials #20

Closed sejimak closed 11 months ago

sejimak commented 11 months ago

Thank you for providing such an incredible model. It appears that you have been using ESPnet to create your models. We are wondering if it is possible for us to fine-tune them on our end? If so, could you possibly provide any tutorials or guides on how to do this? We apologize if there is already existing documentation that we have not yet discovered. Thank you very much for your assistance.

sw005320 commented 11 months ago

I’m just providing a general fine-tuning documentation: https://espnet.github.io/espnet/espnet2_training_option.html#transfer-learning-fine-tuning-using-pretrained-model You can also refer to https://github.com/espnet/espnet/blob/35b8f01f07dff7f9c741a4b937994b3375698365/egs2/snips/asr1/run.sh#L31

But maybe it’s better for @fujimotos or others to create more specific examples since it would be very useful.

sw005320 commented 11 months ago

One more comment. Now, we are making an easy training/fine-tuning scheme called espnetez https://github.com/espnet/espnet/pull/5372 We have a fine-tuning example https://github.com/espnet/espnet/blob/c7884444ca204049b045f1ba6f2297a34e904374/egsez/asr/libri100_finetune.ipynb

If someone is interested in this development item, please let me know. We want to have more collaborators :)

fujimotos commented 11 months ago

We are wondering if it is possible for us to fine-tune them on our end?

@sejimak it is possible. You basically have to learn how to train ESPnet models.

Once you have set up a working training environment, the rest will be fairly streight-forward. All you need to do is:

Teach scripts in the "local" folder to load your dataset.
Tweak ReazonSpeech's training config for your machine spec.
Get a checkpoint file from Hugging Face
Run ./run --init_param /path/to/valid.acc.ave_3best.pth

So I'd recommend to start from training some model by your own, choosing from egs2 directory (any recipe would do; ReazonSpeech is a fairly vanilla model).

Or you can try the new espnetez module as suggested by @sw005320, which seems like a nice improvement for Python devs.

sejimak commented 11 months ago

@sw005320 @fujimotos Thank you for the explanations. I think I will try espnetez.