keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.96k stars 957 forks source link

Training the model with a 2GB card #270

Closed aminbaig closed 5 years ago

aminbaig commented 5 years ago

Hi all, Is it possible to train the model using tensorflow-gpu on a gtx 750? It has only 2GB of RAM and when I tried training with it it gives the OOM (Out of Memory) Error.

My question is, is it possible to teak certain settings for tensorflow so that the lower memory card can process the training and we can take benefit of the GPU?

rotorooter101 commented 5 years ago

There are several things that can be done to reduce the memory footprint of the keithito/tacotron model.

  1. Increase outputs_per_step up to r=5 (perhaps more?) This causes prediction of r frames of audio per decoding, effectively dividing the size of each utterance by nearly a factor of r. I run at r=5 all the time, it speeds up training too.
  2. Use shorter utterances. Split sentences into shorter amounts, or limit the maximum size used. Find, uh, a fast talker for your dataset.
  3. It's possible to lower the size of all the network modules -- the encoder, the decoder, the prenet, the attention buffer. There are repo's that increase the size, but I don't know what effect this will have on quality.

One of those should at least fix your memory problem!

Ittiz commented 5 years ago

I have a system with a 2gig card. You need to alter the code a bit to allow for GPU memory fractioning. I set the fraction to 0.8. Or the thing will always OOM on you. Also the only way I was able to get it to work after that was decrease the batch size to 16 from 32. It will take longer to align, and may not align at all but that's the only way. Either that or just use CPU only. I found with the my set up (2 cpus and 48gbs of mem) it trains faster if I use the CPU with more reasonable settings. The steps go slower but it aligns in many fewer steps. I use the gpu for synthesizing only since it's much faster and it's a light enough activity to not OOM with only 2gbs.