Open wszyy opened 2 years ago
The output of DecoderRNN-T is combined with 4 dimensions, how to use it to recognize speech? Besides, the auther make the model architecture with the LAS? Such as: Conformer-Encoder, LSTM-Decoder, Attention?
Check this project
The output of DecoderRNN-T is combined with 4 dimensions, how to use it to recognize speech? Besides, the auther make the model architecture with the LAS? Such as: Conformer-Encoder, LSTM-Decoder, Attention?