mila-iqia / platoon

Multi-GPU mini-framework for Theano
MIT License
195 stars 41 forks source link

GPU Memory cost too much with platoon #84

Open mingxuan opened 7 years ago

mingxuan commented 7 years ago

I write a neural machine translation system with platoon. The batch size is 80 and sync every 10 mini-batches. I found that the memory cost about 4 times larger than the same system without platoon. Does someone else have the same experience?

I have also test the "lstm" example, which cost about 5GB memory with 16 batch size and 1024 hidden size. Could some else help me to find the problem?

nouiz commented 7 years ago

Is it CPU or GPU memory? How do you see that 4x difference?

How many GPUs are used in parallel?

Normally, it should not use more memory on the GPU. But it could use more memory on the CPU depending how you use it. Each process/GPU use extra CPU memory.

On Thu, Dec 8, 2016 at 3:52 AM, mingxuan notifications@github.com wrote:

I write a neural machine translation system with platoon. The batch size is 80 and sync every 10 mini-batches. I found that the memory cost about 4 times larger than the same system without platoon. Does someone else have the same experience?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/platoon/issues/84, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-wAprY7wXTSZFxX8IQ0OHanPhwNdks5rF3DegaJpZM4LHYb0 .

mingxuan commented 7 years ago

It's GPU memory. I use the command "nvidia-smi" to see the GPU memory cost. I found that when use platoon, the memory cost is stable and the "GPU-util" is very closing to 100%. While without platoon, my gpu cost will change rapidly during training and the "CPU-Util" will also vary form about 30% to 100%. Would Platoon change the default config of Theano? Thanks for your help.

mouna99 commented 7 years ago

I meet the same problem, and it is worse for me to have the "out of memory" error, so my nmt system can not train with platoon at all. Have you finally solved this problem?

Thanks for your help.

mingxuan commented 7 years ago

The problem may comes from NCCL and pygpu. I find that theano built with NCCL and pygpu cost much more memory than previous version.

cshanbo commented 7 years ago

Yes. The more memory cost does caused by the new back-end of Theano. We prefer to use THEANO_FLAGS=gpuarray.preallocate=0.95,... to pre-allocate GPU memory, where you could set 0.95 to any other digit \in (0, 1). See this issue