k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
923 stars 297 forks source link

Finetune Librispeech for a low resource language #987

Open nabil6391 opened 1 year ago

nabil6391 commented 1 year ago

Hi Guys, amazing works with the icefall recipes. I am quite new to using the recipes and having a hard time creating a custom dataset using lhotse that I have for my language (Bengali).

I have seen @marcoyang1998 adding some finetuning scripts for librispeech to Gigaspeech and Wenetspeech to Aishell. I am a bit confused how to do finetune for another language.

Please help me if possible. Thanks and great work!

marcoyang1998 commented 1 year ago

Hi, we didn't try cross-language finetuning before, but we can work this out together.

First of all, have you managed to create your own dataset using Lhotse?

nabil6391 commented 1 year ago

Not yet, I am trying to follow yesno recipe and librispeech. As far as I understand I have to create the manifest files first, then I can compute_fbank, prepare_lang and compute_hlg.

https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh

marcoyang1998 commented 1 year ago

As far as I understand I have to create the manifest files first

Yes. This should be the first step. BTW, how many hours of transcribed speech do you have? Are they open-sourced?

nabil6391 commented 1 year ago

I was a very small dataset. For starters, I am considering using the common voice Bengali Dataset, which is larger than my one. I believe there are other bigger datasets as well but have to check.

marcoyang1998 commented 1 year ago

OK, once you finished preparing the dataset, you can start training from scratch (as a baseline to finetune). If you encounter any problems, feel free to ask here.

joazoa commented 1 year ago

Amazing work indeed, I have already trained one language (Transducer7_streaming + BPE) and would also like to finetune to a lower resource language, with different Alphabet, but same number of tokens.

What steps should I take ?

marcoyang1998 commented 1 year ago

@joazoa I assume you are using a different output vocabulary? Then you can only initialize the encoder, the decoder and joiner need to be trained from a random initialization.

nabil6391 commented 1 year ago

@marcoyang1998 how do I initialize the encoder, the decoder and joiner from a random initialization.

desh2608 commented 1 year ago

I think what @marcoyang1998 meant was that if you have a different output vocabulary, you would need to randomly initialize the decoder and joiner. But the encoder can be initialized from your final checkpoint. You can do something like model.encoder.load_state_dict(ckpt["model"], strict=False).

nabil6391 commented 1 year ago

Thanks for replying @desh2608 , I am still quite new to k2-icefall.

Should I modify and put this part in the train.py file?

desh2608 commented 1 year ago

Thanks for replying @desh2608 , I am still quite new to k2-icefall.

Should I modify and put this part in the train.py file?

Yes. The train.py script is plain PyTorch, so you can modify it to your requirements as you would any other PyTorch training code.