jiasenlu / HieCoAttenVQA

347 stars 123 forks source link

Fail to run train.lua #30

Open shuait opened 7 years ago

shuait commented 7 years ago

This is maybe a trivial question but I'm completely new to torch, I tried to search on Google but no luck. I'm working with a Ubuntu 14.04 machine, cuda 7.0 and cudnn R4 version. I prepared all training files and when running train.lua it gives me this error:

{ input_img_train_h5 : "data/vqa_data_img_vgg_train.h5" learning_rate_decay_every : 300 optim : "rmsprop" hidden_size : 512 optim_epsilon : 1e-08 output_size : 1000 rnn_layers : 2 input_img_test_h5 : "data/vqa_data_img_vgg_test.h5" losses_log_every : 600 id : "0" input_ques_h5 : "data/vqa_data_prepro.h5" learning_rate_decay_start : 0 start_from : "" gpuid : 6 seed : 123 input_json : "data/vqa_data_prepro.json" optim_beta : 0.995 batch_size : 20 iterPerEpoch : 1200 rnn_size : 512 max_iters : -1 checkpoint_path : "save/train_vgg" save_checkpoint_every : 6000 learning_rate : 0.0004 co_atten_type : "Alternating" feature_type : "VGG" backend : "cudnn" optim_alpha : 0.99 } DataLoader loading h5 image file: data/vqa_data_img_vgg_train.h5
DataLoader loading h5 image file: data/vqa_data_img_vgg_test.h5
DataLoader loading h5 question file: data/vqa_data_prepro.h5 DataLoader loading json file: data/vqa_data_prepro.json
assigned 215375 images to split 0
assigned 121512 images to split 2
Building the model...
total number of parameters in word_level: 8031747 total number of parameters in phrase_level: 2889219 total number of parameters in ques_level: 5517315 constructing clones inside the ques_level
total number of parameters in recursive_attention: 2862056 /home/raamac/torch/install/bin/luajit: ./misc/word_level.lua:86: the class torch.CudaByteTensor cannot be indexed stack traceback: [C]: in function '__newindex' ./misc/word_level.lua:86: in function 'forward' train.lua:253: in function 'lossFun' train.lua:310: in main chunk [C]: in function 'dofile' ...amac/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

haibin894609937 commented 7 years ago

I have the same error , have you solved the problem?

shuait commented 7 years ago

@haibin894609937 I still has no clue what caused the problem, tried to reinstall torch and that failed too.

haibin894609937 commented 7 years ago

I run it on centos7 cuda8.0

来自 魅族 PRO 5

-------- 原始邮件 -------- 发件人:Shuai Tang notifications@github.com 时间:周二 5月9日 21:29 收件人:jiasenlu/HieCoAttenVQA HieCoAttenVQA@noreply.github.com 抄送:haibin894609937 liuhaibin210317@hotmail.com,Mention mention@noreply.github.com 主题:Re: [jiasenlu/HieCoAttenVQA] Fail to run train.lua (#30)

@haibin894609937https://github.com/haibin894609937 I still has no clue what caused the problem, tried to reinstall torch and that failed too.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jiasenlu/HieCoAttenVQA/issues/30#issuecomment-300163643, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGon7qYRZuyyGl2-8czEbOf2NMIGJ5OFks5r4GongaJpZM4NFrCe.

Jhhuangkay commented 7 years ago

You guys are working on VQA dataset, right? If yes, I guess the problem is on your vqa_data_prepro.json and vqa_data_prepro.h5. You can try to use other dataset the author provided, cocoqa. If you replace the above two files by cocoqa_data_prepro.json and cocoqa_data_prepro.h5, all the code should run well. When I replace those two files, everything works well. So, you also can try this, then you will know the problem is the generation of prepro files.