harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

Error in fine tuning with 2 gpus #83

Closed ankitag9 closed 7 years ago

ankitag9 commented 7 years ago

Hi,

I have a pretrained model and I am trying to fine tune it using 2 gpus. I am getting the following error -

/home/strange/torch/install/bin/luajit: /home/strange/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: /home/strange/torch/install/share/lua/5.1/nn/Linear.lua:67: Assertion `THCTensor(checkGPU)(state, 4, r, t, vec1, vec2)' failed. at /home/strange/torch/extra/cutorch/lib/THC/generic/THCTensorMathBlas.cu:138 stack traceback: [C]: in function 'addr' /home/strange/torch/install/share/lua/5.1/nn/Linear.lua:67: in function </home/strange/torch/install/share/lua/5.1/nn/Linear.lua:53> [C]: in function 'xpcall' /home/strange/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/strange/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' train.lua:499: in function 'train_batch' train.lua:752: in function 'train' train.lua:1080: in function 'main' train.lua:1083: in main chunk [C]: in function 'dofile' ...ange/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/strange/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/strange/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' train.lua:499: in function 'train_batch' train.lua:752: in function 'train' train.lua:1080: in function 'main' train.lua:1083: in main chunk [C]: in function 'dofile' ...ange/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50

What can be done to remove this error?

yoonkim commented 7 years ago

try first converting the model to cpu using convert_to_cpu.lua

ankitag9 commented 7 years ago

It works.. Thanks!!