macournoyer / neuralconvo

Neural conversational model in Torch
777 stars 346 forks source link

Issue with CUDA #4

Closed pender closed 8 years ago

pender commented 8 years ago

Hi, you just fixed my issue with the lowercase penlight references (thank you!), which unblocked me enough to encounter the following CUDA issue... :)

It seems to work without CUDA, i.e. on the CPU, but with CUDA enabled I get the following:

$ th train.lua --cuda --dataset 5000 --hiddenSize 1000
-- Loading dataset  
data/vocab.t7 not found 
-- Parsing Cornell movie dialogs data set ...   
 [================== 387810/387810 ============>]ETA: 0ms | Step: 0ms           
-- Pre-processing data  
 [================== 5000/5000 ================>]ETA: 0ms | Step: 0ms           
-- Removing low frequency words 
 [================== 8151/8151 ================>]ETA: 0ms | Step: 0ms           
Writing data/examples.t7 ...    
 [================== 8151/8151 ================>]ETA: 0ms | Step: 0ms           
Writing data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 7061 
         Examples: 8151 

-- Epoch 1 / 50 

/home/pender/torch/install/bin/luajit: ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: bad argument #3 to 'ClassNLLCriterion_updateOutput' (torch.CudaTensor expected, got number)
stack traceback:
    [C]: in function 'ClassNLLCriterion_updateOutput'
    ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: in function 'forward'
    ...r/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:27: in function 'forward'
    ./seq2seq.lua:69: in function 'train'
    train.lua:76: in main chunk
    [C]: in function 'dofile'
    ...nder/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00405ea0

Any suggestions?

macournoyer commented 8 years ago

Can you try updating to the latest version of rnn:

git clone git@github.com:Element-Research/rnn.git
cd rnn
luarocks install rocks/rnn-scm-1.rockspec
pender commented 8 years ago

The luarocks install worked fine:

$ luarocks install rocks/rnn-scm-1.rockspec 
Using rocks/rnn-scm-1.rockspec... switching to 'build' mode
Cloning into 'rnn'...
remote: Counting objects: 96, done.
remote: Compressing objects: 100% (72/72), done.
remote: Total 96 (delta 17), reused 64 (delta 15), pack-reused 0
Receiving objects: 100% (96/96), 646.20 KiB | 0 bytes/s, done.
Resolving deltas: 100% (17/17), done.
Checking connectivity... done.
cmake -E make_directory build;
cd build;
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/pender/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1"; 
make

-- The C compiler identification is GNU 4.9.2
-- The CXX compiler identification is GNU 4.9.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/pender/torch/install
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_rnn-scm-1-9946/rnn/build
cd build && make install
Install the project...
-- Install configuration: "Release"
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/FastLSTM.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/LookupTableMaskZero.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/GRU.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/BiSequencer.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Padding.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/AbstractSequencer.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/ZeroGrad.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Module.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/BiSequencerLM.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Recurrence.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/MaskZero.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/SAdd.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/RepeaterCriterion.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Recurrent.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/AbstractRecurrent.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Sequencer.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/LinearNoBias.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/RecurrentAttention.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/SequencerCriterion.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/recursiveUtils.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Repeater.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/Recursor.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/init.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/LSTM.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/test.lua
-- Installing: /home/pender/torch/install/lib/luarocks/rocks/rnn/scm-1/lua/rnn/mnistsample.t7
Updating manifest for /home/pender/torch/install/lib/luarocks/rocks
rnn scm-1 is now built and installed in /home/pender/torch/install/ (license: BSD)
$ luarocks list

Installed rocks:
----------------

argcheck
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

cudnn
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

cunn
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

cunnx
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

cutorch
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

cwrap
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

dok
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

dpnn
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

env
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

fftw3
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

gnuplot
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

graph
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

graphicsmagick
   1.scm-0 (installed) - /home/pender/torch/install/lib/luarocks/rocks

image
   1.1.alpha-0 (installed) - /home/pender/torch/install/lib/luarocks/rocks

lbase64
   20120820-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

lua-cjson
   2.1.0-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

luafilesystem
   1.6.3-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

moses
   1.4.0-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

nn
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

nngraph
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

nnx
   0.1-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

optim
   1.0.5-0 (installed) - /home/pender/torch/install/lib/luarocks/rocks

paths
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

penlight
   1.3.2-2 (installed) - /home/pender/torch/install/lib/luarocks/rocks

qtlua
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

qttorch
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

rnn
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

sdl2
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

signal
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

sundown
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

sys
   1.1-0 (installed) - /home/pender/torch/install/lib/luarocks/rocks

threads
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

torch
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

trepl
   scm-1 (installed) - /home/pender/torch/install/lib/luarocks/rocks

xlua
   1.0-0 (installed) - /home/pender/torch/install/lib/luarocks/rocks

but unfortunately I still get the same error:

$ th train.lua --cuda --dataset 5000 --hiddenSize 1000
-- Loading dataset  
Loading vocabulary from data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 7061 
         Examples: 8151 

-- Epoch 1 / 50 

/home/pender/torch/install/bin/luajit: ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: bad argument #3 to 'ClassNLLCriterion_updateOutput' (torch.CudaTensor expected, got number)
stack traceback:
    [C]: in function 'ClassNLLCriterion_updateOutput'
    ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: in function 'forward'
    ...r/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:27: in function 'forward'
    ./seq2seq.lua:69: in function 'train'
    train.lua:76: in main chunk
    [C]: in function 'dofile'
    ...nder/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00405ea0
macournoyer commented 8 years ago

Remove the data files before: rm data/*.t7

pender commented 8 years ago

Doesn't seem to help:

$ th train.lua --cuda --dataset 5000 --hiddenSize 1000
-- Loading dataset  
data/vocab.t7 not found 
-- Parsing Cornell movie dialogs data set ...   
 [============================= 387810/387810 =======================>]ETA: 0ms | Step: 0ms           
-- Pre-processing data  
 [============================= 5000/5000 ===========================>]ETA: 0ms | Step: 0ms           
-- Removing low frequency words 
 [============================= 8151/8151 ===========================>]ETA: 0ms | Step: 0ms           
Writing data/examples.t7 ...    
 [============================= 8151/8151 ===========================>]ETA: 0ms | Step: 0ms           
Writing data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 7061 
         Examples: 8151 

-- Epoch 1 / 50 

/home/pender/torch/install/bin/luajit: ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: bad argument #3 to 'ClassNLLCriterion_updateOutput' (torch.CudaTensor expected, got number)
stack traceback:
    [C]: in function 'ClassNLLCriterion_updateOutput'
    ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: in function 'forward'
    ...r/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:27: in function 'forward'
    ./seq2seq.lua:69: in function 'train'
    train.lua:76: in main chunk
    [C]: in function 'dofile'
    ...nder/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00405ea0
knkrth commented 8 years ago

hey i'm also having the same problem while training did you find any solution?

macournoyer commented 8 years ago

Can you try updating to the latest rnn package from source:

git clone git://github.com/Element-Research/rnn
cd rnn
luarocks install rocks/rnn-scm-1.rockspec
knkrth commented 8 years ago

ok thanks thats solved my problem.