openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
678 stars 114 forks source link

Cuda out of memory #108

Closed Suraj6198 closed 2 years ago

Suraj6198 commented 3 years ago

Tried to train Transformer Transducer model on Librispeech "train-clean-100" dataset on 16GB GPU. But getting "CUDA out of memory" error . Also tried by splitting various layers on 3 GPUs, 16GB each, but getting same thing. And error is pointing to ''joint'' layer, maybe because large size of tensors in joint layer.

Details: Number of MFCCs = 128 Timesteps =512 Vocabulary size = 21800 ( tried to reduce it to 5K, but getting same thing) Embedding layer dimension =Vocab_size*512 Audio Encoder = TransformerTransducerEncoder Label Encoder =TransformerTransducerDecoder loss = RNNTLoss

If anyone trained Transformer Transducer successfully and able to get results comparable with https://arxiv.org/abs/2002.02562 , please let me know the number of accelerators and their respective memory capacity.

sooftware commented 3 years ago

Unfortunately, it takes a lot of memory to learn the speech recognition model.
In particular, the transducers model needs more.

Suraj6198 commented 3 years ago

Hi @sooftware , Thanks for your reply. Do you have idea which GPU would be good to handle this Transformer Transducer model?

sooftware commented 3 years ago

Of course, the bigger the better. If several A100s are possible... That would be good.
Or it would be a good idea to reduce the mfcc coefficient and vocab size.
Transducers will use a lot of memory depending on the vocab size.

Suraj6198 commented 3 years ago

Yes, reducing the tensors dimension by various ways will work. But the thing is, I'm trying to achieve the same WER as mentioned in paper. And reducing dimensions may effect the results.

sooftware commented 3 years ago

Then, I think you have no choice but to use a lot of GPUs. 😂