Loading checkpoint model seems to fail

penguinshin commented 8 years ago

Seems that loading the checkpoint model fails, even though it says it loads. Any thoughts?

Austins-MacBook-Pro:src2anno austinshin$ python run_model.py loading data... done! Source vocab size: 8814, Target vocab size: 15669 Source max sent len: 50, Target max sent len: 52 loading djc.t7... Number of parameters: 13278519 /Users/austinshin/torch/install/bin/luajit: bad argument #1 to '?' (empty tensor at /Users/austinshin/torch/pkg/torch/generic/Tensor.c:888) stack traceback: [C]: at 0x02ebead0 [C]: in function '__index' /Users/austinshin/torch/install/share/lua/5.1/nn/MM.lua:51: in function 'updateGradInput' ...stinshin/torch/install/share/lua/5.1/nngraph/gmodule.lua:386: in function 'neteval' ...stinshin/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'updateGradInput' ...stinshin/torch/install/share/lua/5.1/nngraph/gmodule.lua:386: in function 'neteval' ...stinshin/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'updateGradInput' /Users/austinshin/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward' train.lua:370: in function 'train_batch' train.lua:479: in function 'train' train.lua:644: in function 'main' train.lua:647: in main chunk [C]: in function 'dofile' ...shin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0102b56bc0

yoonkim commented 8 years ago

I just tested and train_from worked for me. Was djc.t7 trained using an older version of the repo?

yoonkim commented 8 years ago

ah, it seems like num_layers specified in the train_from model is diff from your torch command line. i pushed a fix for it so it's safer now (i.e. it will always use the trained model's model parameters). let me know if this works

harvardnlp / seq2seq-attn

Loading checkpoint model seems to fail #17