harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

Invalid device ordinal #27

Closed nicolas-ivanov closed 8 years ago

nicolas-ivanov commented 8 years ago

Thank you for a marvelous library! I'm trying to train a demo model on a GPU, however I get the following error:

ubuntu@testing:~/nicolas/seq2seq-attn$ th train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model -gpuid 0
using CUDA on GPU 0...  
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1657/cutorch/init.c line=711 error=10 : invalid device ordinal
/home/ubuntu/torch/install/bin/luajit: train.lua:770: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-1657/cutorch/init.c:711
stack traceback:
    [C]: in function 'setDevice'
    train.lua:770: in function 'main'
    train.lua:874: in main chunk
    [C]: in function 'dofile'
    ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Any suggestions on how it can be fixed?

All the Lua packages (hdf5, nngraph, cutorch, cunn) seem to be installed properly.

yoonkim commented 8 years ago

Hi, maybe try -gpuid 1?

nicolas-ivanov commented 8 years ago

Tried this already... On 28 Jun 2016 23:20, "Yoon Kim" notifications@github.com wrote:

Hi, maybe try -gpuid 1?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/harvardnlp/seq2seq-attn/issues/27#issuecomment-229170406, or mute the thread https://github.com/notifications/unsubscribe/ACKuFmsCXI8ynpiuWCzBl18y00bY-ZWjks5qQYINgaJpZM4JAawV .

yoonkim commented 8 years ago

hmm seems like a driver issue:

https://devtalk.nvidia.com/default/topic/518479/invalid-device-ordinal-i-can-39-t-find-any-help-about-this-/?offset=3

nicolas-ivanov commented 8 years ago

Wow, thanks a lot! I'll check this up and report back to you. On 29 Jun 2016 00:15, "Yoon Kim" notifications@github.com wrote:

hmm seems like a driver issue:

https://devtalk.nvidia.com/default/topic/518479/invalid-device-ordinal-i-can-39-t-find-any-help-about-this-/?offset=3

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/harvardnlp/seq2seq-attn/issues/27#issuecomment-229185953, or mute the thread https://github.com/notifications/unsubscribe/ACKuFhO-oVP2xnXE5WOg0wtkGVij1ehpks5qQY7QgaJpZM4JAawV .

nicolas-ivanov commented 8 years ago

@yoonkim your link helped, thank you! Apparently, torch was trying to use the default version of cuda instead of the latest one. So I had to append the following lines to my ~/.bashrc file

export CUDA_HOME=/usr/local/cuda-7.5 
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64 

PATH=${CUDA_HOME}/bin:${PATH} 
export PATH

and restart the bash session.