Multi-GPU training - Githubissues

i55code commented 8 years ago

Hi s2s team,

There is a multi-GPU problem, I tried to set DIABLE_CHECK_GPU, it does not work either. Please let me know what would help. Thanks!

using CUDA on GPU 1... using CUDA on second GPU 2... loading data... done! Source vocab size: 28721, Target vocab size: 42787 Source max sent len: 50, Target max sent len: 52 Number of parameters: 66948287 /home//util/torch/install/bin/luajit: /home/util.lua:46: Assertion `THCudaTensorcheckGPU(state, 4, r, t, m1, m2)' failed. at /tmp/luarocks_cutorch-scm-1-7585/cutorch/lib/THC/THCTensorMathBlas.cu:79 stack traceback: [C]: in function 'addmm' /home/util.lua:46: in function 'func' .../util/torch/install/share/lua/5.1/nngraph/gmodule.lua:333: in function 'neteval' .../util/torch/install/share/lua/5.1/nngraph/gmodule.lua:368: in function 'forward' train.lua:367: in function 'train_batch' train.lua:622: in function 'train' train.lua:871: in function 'main' train.lua:874: in main chunk [C]: in function 'dofile' ...util/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

Cheers, Zhong

yoonkim commented 8 years ago

hmm yeah multi-gpu seems to be broken atm--ill try to look into this

i55code commented 8 years ago

Thank you, and if it is convenient, may you point me to where I can multi-GPU on more than 2 nodes, say 4 nodes. Thank you!

yoonkim commented 8 years ago

within our framework, we have the encoder on gpu1, and decoder on gpu2, so it's not possible to utilize more than 2 GPUs (so this is more for memory rather than speed). i believe it should be possible utilize more GPUs for things like data parallelization: examples here https://github.com/soumith/imagenet-multiGPU.torch

i55code commented 8 years ago

Hi Yoon Kim,

Morning! I hope to check with you on multi-GPU training, really appreciate your thoughts As we discussed before, we can have multi-GPU training with encoder on one GPU and decoder on another GPU. The idea works for simple seq2seq. However, with attention, this is tricky. Attention requires training backwards with reference to source sentence for every step of the target sequence, the copying mechanism between encoder GPU and decoder GPU becomes tricky.

Have you tested the code in multi-GPU setting in a single machine, with bidirectional LSTM and attention?

Cheers, Zhong

yoonkim commented 8 years ago

Ok i got around to fixing this. (doesn't work when brnn = 1 though).

The problem you mention is actually not too bad, because we need to copy over the entire hidden state matrix (source length x rnn size) to the second gpu only once, then everything can be done on the second gpu. And copying across gpus is fast.

Hope this helps Yoon

i55code commented 8 years ago

Thank you! What needs to be done to get brnn =1 working for multi-GPU training?

ylhsieh commented 8 years ago

I came across this issue today. It would be great to have brnn working with 2 GPUs. Would you kindly put some warning in the readme regarding using 2 GPUs with brnn? Hopefully that would save others some time poking around the settings. :)

yoonkim commented 8 years ago

yeah sorry about that. i spent a good deal of time trying to debug brnn+multi-gpu. the issue seems to be that for some reason, i can't put the backward encoder on the first gpu.

harvardnlp / seq2seq-attn

Multi-GPU training #23