jnhwkim / MulLowBiVQA

Hadamard Product for Low-rank Bilinear Pooling
Other
71 stars 18 forks source link

Questions about pretrained model #6

Closed suhmily closed 7 years ago

suhmily commented 7 years ago

Hi, when I tried to evaluate the pretrained model with default parameters, I got the following error: bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cutorch-scm-1-245/cutorch/lib/THC/THCTensorCopy.cu:31). And I found that w:size() = 50390702, while the pretrained model's size is 51894822. Could you please help me with this problem? Thanks!

jnhwkim commented 7 years ago

Did you update torch packages including rnn?

sudo luarocks install {package name}

Could you share the log? It's hard to track down the issue upon your provided information.

suhmily commented 7 years ago

Thank you for your kind response. I've reinstalled rnn but the error still occur. My log is as below: { priming : false batch_size : 200 out_prob : false gpuid : 0 common_embedding_size : 1200 input_encoding_size : 620 model_name : "MLB" glimpse : 2 input_json : "/home/titan4/code/hie_co_att/data/vqa_data_prepro.json" num_layers : 1 num_output : 2000 type : "val2014" rnn_size : 2400 label : "" input_ques_h5 : "/home/titan4/code/hie_co_att/data/vqa_data_prepro.h5" model_path : "model/pretrained_MLB.t7" img_feature_prefix : "/home/titan4/code/vqa-mcb/vqa_test_res5c/resnet_res5c_bgrms_large/" backend : "cudnn" rnn_model : "GRU" out_path : "result/pretrained" } DataLoader loading h5 file: /home/titan4/code/hie_co_att/data/vqa_data_prepro.json
DataLoader loading h5 file: /home/titan4/code/hie_co_att/data/vqa_data_prepro.h5
MLB: No Shortcut
shipped data function to cuda...
/home/titan4/torch/install/bin/luajit: eval_orig.lua:189: bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cutorch-scm-1-245/cutorch/lib/THC/THCTensorCopy.cu:31) stack traceback: [C]: in function 'copy' eval_orig.lua:189: in main chunk [C]: in function 'dofile' ...tan4/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405e40

jnhwkim commented 7 years ago

@suhmily I think you've already modified the original code. Could you reproduce the same error after reverting train.lua? Notice that question embedding module (GRU) should be the same architecture and matched in terms of # of parameters as appeared in the published code.

suhmily commented 7 years ago

@jnhwkim Thanks a lot! Turns out the vocabulary size was changed and caused this problem.

suhmily commented 7 years ago

@jnhwkim May I ask if the pretrained model achieves the released result or it's just a showcase?

jnhwkim commented 7 years ago

@suhmily exactly the same with the best single model (not augmented). I've double checked using a physically independent server.