Training is not converging. eval_wer sticks at ~95%.

noahchalifour / rnnt-speech-recognition

End-to-end speech recognition using RNN Transducers in Tensorflow 2.0

MIT License

241 stars 78 forks source link

Training is not converging. eval_wer sticks at ~95%. #35

Open stefan-falk opened 4 years ago

stefan-falk commented 4 years ago

I finally was able to run a training on a single GPU (multi-GPU does not seem to work right now) but the word-error-rate is not dropping.

I did not change anything in the code and I am using the common voice dataset as suggested by the README.md

As you can see below, the train_loss drops but the eval_wer goes back up after a slight drop:

Any idea where this might come from?

PeiyanFlying commented 4 years ago

Excuse me, but I have another question. When I train the model, I always run into "out of memory". Just like this:

RuntimeError: CUDA out of memory. Tried to allocate 8.05 GiB (GPU 0; 23.62 GiB total capacity; 18.02 GiB already allocated; 2.84 GiB free; 19.59 GiB reserved in total by PyTorch)

I use one GPU to train, the memory size is 23.6GiB. So how could you succeed running model only on one GPU？ Many thanks!

stefan-falk commented 4 years ago

@PeiyanFlying I am using a rather small batch size like 8 or 16 on a GeForce 1080 Ti (11 GB VRAM). In fact, multi-GPU seems to be broken at the moment. I am not able to use more GPUs than one at this point.

PeiyanFlying commented 4 years ago

@PeiyanFlying I am using a rather small batch size like 8 or 16 on a GeForce 1080 Ti (11 GB VRAM). In fact, multi-GPU seems to be broken at the moment. I am not able to use more GPUs than one at this point.

Thank you very much. These days I am working on RNNT training on LibriSpeech with Pytorch. But with the same config setting of this repository, It's easy to run into the OOM problem. I try to check. Thanks!

stefan-falk commented 4 years ago

@PeiyanFlying Did you have any success yet? And, could you link me to that Pytorch library you're using? I'd like to take a look in case https://github.com/noahchalifour/rnnt-speech-recognition won't work for me

PeiyanFlying commented 4 years ago

Ok, I am working on it. Once the PyTorch library can run successfully, I give you the link.

noahchalifour commented 4 years ago

@stefan-falk I have also noted that the model is not converging. I have been working on a solution for a while. It seems though if you use a small enough dataset (as a test) the model does successfully converge. I did read that in the original paper they are using massive batch sizes and im not sure if that is the reason why the model is not converging. Any insights?

WrathOfGrapes commented 3 years ago

@noahchalifour Correct me if I'm wrong... Nobody has managed to train the network from this repo to reach at least 30 WER on Libri/common_voice?