llSourcell / Chatbot-AI

Chatbot AI for Machine Learning for Hackers #6
MIT License
265 stars 115 forks source link

Killed #2

Open bobhinkle opened 8 years ago

bobhinkle commented 8 years ago

When attempting to train the program will run and the post "Killed"

th train.lua --dataset 50000 --hiddenSize 1000 -- Loading dataset data/vocab.t7 not found -- Parsing Cornell movie dialogs data set ... [==================== 387810/387810 ==========>] Tot: 3s942ms | Step: 0ms -- Pre-processing data [==================== 50000/50000 ============>] Tot: 33s312ms | Step: 0ms -- Removing low frequency words [==================== 83632/83632 ============>] Tot: 12s831ms | Step: 0ms Writing data/examples.t7 ... [==================== 83632/83632 ============>] Tot: 28s333ms | Step: 0ms Writing data/vocab.t7 ...

Dataset stats: Vocabulary size: 25931 Examples: 83632 Killed

Running the basic readme demo

bobhinkle commented 8 years ago

Tried re-running with a smaller data set and now get this

-- Epoch 1 / 50

/home/ubuntu/torch/install/bin/luajit: ...u/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table stack traceback: [C]: in function 'assert' ...u/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward' ./seq2seq.lua:74: in function 'train' train.lua:85: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0000cff9

llSourcell commented 8 years ago

@bobhinkle could be related to this dependency issue, he posts a solution https://github.com/llSourcell/Chatbot-AI/issues/1 if not that it could be a memory overflow? if you are running this locally, i suggest running it on AWS. See ML for Hackers #4 for an AWS walkthrough.

juharris commented 8 years ago

@llSourcell I had the same issue and got expecting target table. I don't think it's related to the dependencies because I tried with and without CUDA:

That stuff about dependencies was mainly about when running with OpenCL.

I'm running on a CentOS machine with Lua 5.3.3. I also tried with Lua 5.1.4.

juharris commented 8 years ago

Temporary solution: Comment out the failing asserts in SequencerCriterion.lua (in ~/torch/install/share/lua/5.2/rnn/SequencerCriterion.lua for me because I installed Torch with TORCH_LUA_VERSION=LUA52 ./install.sh by default you'll find it at ~/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua).

Those checks on target don't really seem that necessary to us.

I trained it pretty quickly using th train.lua --dataset 500 --hiddenSize 100 --maxEpoch 10 --saturateEpoch 4 and it works but the answers aren't that good, hopefully that just because of my constraints and not because something else went wrong.

@llSourcell maybe changing the type of decoderTarget would be a better fix?

llSourcell commented 8 years ago

@juharris thanks so much for your posts on here with issues. I now have 7 ML for hackers repos to maintain with much more content to come so I may need some help with this issue. The quick fix you posted, could you make a PR with it? I would really appreciate. I'll merge it immediately

juharris commented 8 years ago

@llSourcell The fix isn't in this repo. The fix is the rnn package. Looks like they'll have a fix coming in the original repo: https://github.com/macournoyer/neuralconvo/issues/31