Open scufan1990 opened 2 years ago
Hello, I meet the same problem as you, but I use the Conformer Encoder and Transformer Decoder. By the way, do you solve the problem about the output of DecoderRNNT? It's combined with 4 dimensions, how to use it to recognize speech?
Could you tell me what dataset you use in your training?How long it would use to train a ckpt? I find dataset Librispeech with 970 hours in paper. It seems that will cost a lot of time in training.
Um, I use the aishell-1, training beyond 10 hours, but the effects is not very well. Actually, I use the Google Colab to train the model, it really takes a lot of time. By the way, do you understand the 4 dimensions results? The auther just use torch.cat to connect the encoder_output matrix and decoder_output matrix, it seems that the network can not be used to recognize speech. So, I build two networks: 1、Conformer's encoder and Transformer's decoder 2、Conformer's encoder and LSTM decoder with attention mechanism. Now, I have been training the two network for several days.
Um, I use the aishell-1, training beyond 10 hours, but the effects is not very well. Actually, I use the Google Colab to train the model, it really takes a lot of time. By the way, do you understand the 4 dimensions results? The auther just use torch.cat to connect the encoder_output matrix and decoder_output matrix, it seems that the network can not be used to recognize speech. So, I build two networks: 1、Conformer's encoder and Transformer's decoder 2、Conformer's encoder and LSTM decoder with attention mechanism. Now, I have been training the two network for several days.
Thanks for your reply! I have not decide the model and dataset which to use yet. I would like to share with you if there is some futher info.
That will be OK, I'm also need to communicate with other to know more about the network. Do you come from china? Maybe we can change the contact.
That will be OK, I'm also need to communicate with other to know more about the network. Do you come from china? Maybe we can change the contact.
hello wszyy,I come from China. I have been learning about conformer's model recently and would like to communicate with you about it. If you are willing, you can add my wechat, ID: scrushy518
Hi, There is a problem about training a conformer+RNN-T model. How about the cer and wer with one GPU?
I'm train the model on one RTX TITAN GPU, training the conformer(encoder layers 16, encoder dim 144, decoder layer 1, decoder dim 320), after 50 epoch training the CER is about 27 and don't reduce anymore.