macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

training_fail_/seq2seq.lua:62: attempt to call field 'recursiveCopy' #80

Closed ArashHosseini closed 6 years ago

ArashHosseini commented 6 years ago

@neuralconvo is running on two other machines with cuda and opencl without any probelm.....NOW i setup a fresh os with 14.04, cuda8 and changed cudnn5 to 6 because of tf1.3....torch install done...

trace:

`$ th train.lua --cuda --dataset 5000 --hiddenSize 100-- Loading dataset
data/vocab.t7 not found -- Parsing Cornell movie dialogs data set ...
[==================== 387810/387810 ==========>] Tot: 1s238ms | Step: 0ms
-- Pre-processing data
[==================== 5000/5000 ==============>] Tot: 771ms | Step: 0ms
-- Shuffling
Writing data/examples.t7 ...
[==================== 8151/8151 ==============>] Tot: 703ms | Step: 0ms
Writing data/vocab.t7 ...

Dataset stats:
Vocabulary size: 7061 Examples: 8151

-- Epoch 1 / 50 (LR= 0.001)

~/torch/install/bin/luajit: ./seq2seq.lua:62: attempt to call field 'recursiveCopy' (a nil value) stack traceback: ./seq2seq.lua:62: in function 'forwardConnect' train.lua:97: in function 'opfunc' /home/flyn/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' train.lua:131: in main chunk [C]: in function 'dofile' ...flyn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670 ` never hit this issue before and cant find related details on web, thanks for help

ArashHosseini commented 6 years ago

i talked with marc about that point, he will look into soonish........can you post your env settings, cuda and cudnn version please

ArashHosseini commented 6 years ago

@hit-lacus there is no other participant yet. yeah please

shrutiphadke commented 6 years ago

I have the same problem. Exact same error. I am using Torch without Cuda/OpenCL on Ubuntu 16.04.

baaleze commented 6 years ago

It looks like the way to call recursiveCopy has changed. In the file seq2seq.lua line 62 & 64 if I replace nn.rnn.recursiveCopy by nn.utils.recursiveCopy it works for me. Hope that can help.

ArashHosseini commented 6 years ago

perfectly, can also confirm

ArashHosseini commented 6 years ago

@baaleze, thx again, did you also got on eval after training?

Loading vocabulary from data/vocab.t7 ...   
-- Loading model    

Type a sentence and hit enter to submit.    
CTRL+C then enter to quit.

you> hello
/home/flyn/torch/install/bin/luajit: /home/flyn/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 3 module of nn.Sequential:
/home/flyn/torch/install/share/lua/5.1/torch/Tensor.lua:466: Wrong size for view. Input size: 1000. Output size: 25931
stack traceback:
    [C]: in function 'error'
    /home/flyn/torch/install/share/lua/5.1/torch/Tensor.lua:466: in function 'view'
    /home/flyn/torch/install/share/lua/5.1/rnn/utils.lua:191: in function 'recursiveZeroMask'
    /home/flyn/torch/install/share/lua/5.1/rnn/MaskZero.lua:37: in function 'updateOutput'
    /home/flyn/torch/install/share/lua/5.1/rnn/Recursor.lua:13: in function '_updateOutput'
    ...yn/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:50: in function 'updateOutput'
    /home/flyn/torch/install/share/lua/5.1/rnn/Sequencer.lua:53: in function </home/flyn/torch/install/share/lua/5.1/rnn/Sequencer.lua:34>
    [C]: in function 'xpcall'
    /home/flyn/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/flyn/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:87: in function 'eval'
    eval.lua:55: in function 'say'
    eval.lua:69: in main chunk
    [C]: in function 'dofile'
    ...flyn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    /home/flyn/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    /home/flyn/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:87: in function 'eval'
    eval.lua:55: in function 'say'
    eval.lua:69: in main chunk
    [C]: in function 'dofile'
    ...flyn/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x00406670
fengmao31 commented 6 years ago

i have the bug too

Jeavy commented 6 years ago

Hi, I've trained a model with "th train.lua --cuda --dataset 50000 --hiddenSize 1000" and after that, I got same error as @ArashHosseini (Wrong size for view) when tried to chat. Does anyone know how to fix it? Thanks for help.

ghost commented 6 years ago

I found a simple solution. Try this. In the file seq2seq.lua line 87, change local prediction = self.decoder:forward(torch.Tensor(output))[#output] to local prediction = self.decoder:forward(torch.Tensor({output}):t())[#output][1]

ArashHosseini commented 6 years ago

@Tak-o-m great report, can also agree.