k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
931 stars 295 forks source link

How to train a phone-based RNNT #860

Closed didadida-r closed 7 months ago

didadida-r commented 1 year ago

Hi, i want to know how to train a phone-based RNNT in icefall.

Thanks

yaozengwei commented 1 year ago

You can refer to https://github.com/k2-fsa/icefall/issues/855#issuecomment-1407567658

didadida-r commented 1 year ago

thanks, the tdnn_lstm_ctc use the ctc loss, but how to train a phone-based RNNT with RNNT loss, like zipformer or enformer.

csukuangfj commented 1 year ago

The data preparation part should be similar. Please post your issues when you adapt it for the transducer.

wangtiance commented 1 year ago

You only need to change a few lines. In particular, you need a UniqLexicon object to get token ids. Allow me to promote my PR, which supports both BPE and phone training: https://github.com/wangtiance/icefall/blob/tiny/egs/librispeech/ASR/tiny_transducer_ctc/train.py

I'd also like to know your motivation to use phone lexicon, because it performs a lot worse than BPE for transducers.

didadida-r commented 1 year ago

i think the UniqLexicon is incompatiable for chinese, and i don't know how to implement multiple prons lexicon

csukuangfj commented 1 year ago

https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc

this recipe does not use UniqLexicon and it does support words with multiple pronunciations. You can have a look and find how it is implemented.