Closed suhmily closed 7 years ago
Did you update torch packages including rnn
?
sudo luarocks install {package name}
Could you share the log? It's hard to track down the issue upon your provided information.
Thank you for your kind response. I've reinstalled rnn but the error still occur. My log is as below:
{
priming : false
batch_size : 200
out_prob : false
gpuid : 0
common_embedding_size : 1200
input_encoding_size : 620
model_name : "MLB"
glimpse : 2
input_json : "/home/titan4/code/hie_co_att/data/vqa_data_prepro.json"
num_layers : 1
num_output : 2000
type : "val2014"
rnn_size : 2400
label : ""
input_ques_h5 : "/home/titan4/code/hie_co_att/data/vqa_data_prepro.h5"
model_path : "model/pretrained_MLB.t7"
img_feature_prefix : "/home/titan4/code/vqa-mcb/vqa_test_res5c/resnet_res5c_bgrms_large/"
backend : "cudnn"
rnn_model : "GRU"
out_path : "result/pretrained"
}
DataLoader loading h5 file: /home/titan4/code/hie_co_att/data/vqa_data_prepro.json
DataLoader loading h5 file: /home/titan4/code/hie_co_att/data/vqa_data_prepro.h5
MLB: No Shortcut
shipped data function to cuda...
/home/titan4/torch/install/bin/luajit: eval_orig.lua:189: bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cutorch-scm-1-245/cutorch/lib/THC/THCTensorCopy.cu:31)
stack traceback:
[C]: in function 'copy'
eval_orig.lua:189: in main chunk
[C]: in function 'dofile'
...tan4/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405e40
@suhmily I think you've already modified the original code. Could you reproduce the same error after reverting train.lua
? Notice that question embedding module (GRU) should be the same architecture and matched in terms of # of parameters as appeared in the published code.
@jnhwkim Thanks a lot! Turns out the vocabulary size was changed and caused this problem.
@jnhwkim May I ask if the pretrained model achieves the released result or it's just a showcase?
@suhmily exactly the same with the best single model (not augmented). I've double checked using a physically independent server.
Hi, when I tried to evaluate the pretrained model with default parameters, I got the following error: bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cutorch-scm-1-245/cutorch/lib/THC/THCTensorCopy.cu:31). And I found that w:size() = 50390702, while the pretrained model's size is 51894822. Could you please help me with this problem? Thanks!