larspars / word-rnn

Recurrent Neural Network that predicts word-by-word
474 stars 103 forks source link

Error when sampling #13

Open allthetime opened 8 years ago

allthetime commented 8 years ago

When running sample.lua against t7 files I frequently (but not always, depending on set temperature and seed text) come up against this error

/home/me/torch/install/bin/luajit: bad argument #2 to '?' (out of bounds at /home/me/torch/pkg/torch/lib/TH/generic/THStorage.c:178)
stack traceback:
    [C]: at 0x7f38999b08e0
    [C]: in function 'multinomial'
    sample.lua:170: in main chunk
    [C]: in function 'dofile'
    ...time/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

It comes up after some text has been predicted usually, and tends to show up sooner (less text predicted) when temperature is lower. Higher temperature lets more prediction through before the error occurs

It seems a similar issue exists(ed?) in char-nn https://github.com/karpathy/char-rnn/issues/28

From that thread: "The error means your data are naned. Two possible causes include the weights becoming naned during training, or the cv snapshot file being corrupted somehow."

Is there any way I can avoid this situation?

amitphadke commented 7 years ago

Same error here, did you find a solution?

bloons3 commented 7 years ago

Might be related to https://github.com/jcjohnson/torch-rnn/pull/195

I have been experiencing this error as well

lowtronik commented 7 years ago

I used torch-rnn and word-rnn with a 5MB dataset with no problems. I got the same error with a 8.2MB set.

lowtronik commented 7 years ago

Tried to use word-rnn with on CPU and 56GB ram , no luck as well (before was on 6GB GPU)