Open st-tomic opened 4 years ago
Hey @st-tomic
My primary goal was slightly different. I just wanted to provide the good and open-sourced Polish ASR. I tried to experiment with the Mozilla DeepSpeech, Kaldi, etc. there are several attempts, but well ... They are overcomplicated and too specific for further research. I decided to build this little package from scratch.
OK, and back to the question. To make this package more general, I had to adjust my aim and provide the English model. I plan to train a model from the very beginning, but for now, I adapted the English model from the Seq2Seq repository (here the NVIDIA documentation, and the configuration file where you can find detailed information here and the my model adaptation file - It should be compatible what we have here)
I do not want to stuck with CTC based models. In the next months, I will do the second version of this package, where I introduce the Transformer based English ASR (I am quite fascinated about NLP in general, check out my new repo: Aspect Based Sentiment Analysis).
ps. The presented result is for the greedy decoder. In my opinion, the sophisticated decoding algorithms are old-fashioned, crude... isn't it? ;)
Hi @rolczynski,
Thanks for the feedback and interesting info. I agree that we should look at the wider image also :)
I am looking forward to seeing your future work.
Best regards.
Hi @rolczynski
I am experimenting with your code and would like to know how to repeat benchmark results from the Table?
Is it the pipeline from readme? With 25epoch and batch size of 32? How many gpu-s did you use (4x8 I guess)?
Dataset should be full librispeech. Was data augmentation used?
Does the code support decoding on the whole dev-clean subset?