Multi-GPU Support not working great

minimaxir / textgenrnn

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Other

4.94k stars 755 forks source link

Multi-GPU Support not working great #62

Open minimaxir opened 6 years ago

minimaxir commented 6 years ago

I added a Multi-GPU function that extended Keras's multi_gpu_model() and verified that it indeed utilizes multiple GPUs (via nvidia-smi). However, for smaller models, training speed is about the same, if not worse.

I suspect that the CuDNNLSTMs are so efficient that it mitigates the gains of using multiple GPUs are lost from overhead.

Would using Horovod fix this? I have no issues implementing that instead.

bafonso commented 5 years ago

I am curious about your findings... I have found when I train I only use roughly ~50-70% of my 1070 according to nvidia-smi. This is using a tensorflow docker with added textgenrnn.

ryanmjacobs commented 5 years ago

@minimaxir Hey, sorry to revive an old issue, but how did you manage to implement multi-GPU usage?

Edit: Some additional information: I have access to two RTX 2080 TIs, and so far I've been training separate experiments using various hyperparameters. Now I've settled on hyperparameters that seem to work best, but I would love if I could even run 50% faster by utilizing both GPUs on a single training session.

peeves91 commented 4 years ago

@ryanmjacobs how did you manage to get one working? even that's failing for me. do you need both tensorflow and tensorflow-gpu?