s-omranpour / Pytorch-Speech-Recognition

A simple implementation of the paper https://arxiv.org/pdf/1910.00716v1.pdf
GNU General Public License v3.0
31 stars 11 forks source link

anyone get a good performance? #5

Open qute012 opened 4 years ago

qute012 commented 4 years ago

Hi. I ran this project for korean speech recognition. But loss is not decreasing and i don't get good predictions. I've already used preprocessing method that works well on DeepSpeech and LAS.

It's seems like to DeepSpeech architectures. but not, in the paper use hmm pre-builded on kaldi processing and lf-mmi instead of CTC.

https://www.danielpovey.com/files/2020_interspeech_multistream.pdf

above, like this project reference, using single stream before multi stream. who knows problems or gets good performance using this project?

kouohhashi commented 3 years ago

I tried with LibriSpeech train-clean-100 dataset but WER didn't improved at all. My WER was around 0.95.

I changed 2 things.

  1. changed stride from strides=[5,2,1] to strides=[2,2,1] to avoid assertion error.
  2. change sample rate from 48000 to 16000.

I don't know what I need to change...