ruotianluo / Image_Captioning_AI_Challenger

Code for AI Challenger contest. (Generating chinese image captions)
213 stars 69 forks source link

an error during training progress about broadcasting #8

Closed fearless77 closed 6 years ago

fearless77 commented 6 years ago

run_train.sh: the parameters we have changed as follows

! /bin/sh

larger batch

id="dense_box_bn"$1 ckptpath="log"$id if [ ! -d $ckpt_path ]; then mkdir $ckpt_path fi if [ ! -f $ckptpath"/infos"$id".pkl" ]; then start_from="" eelse start_from="--start_from "$ckpt_path fi

the error we meet when running the train.py python train.py --id $id --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --max_epoch 37 --rnn_size 1300 --use_box 0 --use_bn 0 vipsl-422-1@vipsl-422-1:~/enjoy-zhangyi/ImageCaptioninginChinese$ bash run_train.sh Tensorflow not installed; No tensorboard logging. DataLoader loading json file: data/chinese_talk.json vocab size is 4461 DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5 max sequence length in data is 20 read 240000 image features assigned 220000 images to split train assigned 10000 images to split val assigned 10000 images to split test Traceback (most recent call last): File "train.py", line 229, in train(opt) File "train.py", line 115, in train data = loader.get_batch('train') File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/dataloader.py", line 163, in get_batch data['att_feats'][iseq_per_img:(i+1)seq_per_img, :att_batch[i].shape[0]] = att_batch[i] ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7) Terminating BlobFetcher Tensorflow not installed; No tensorboard logging. DataLoader loading json file: data/chinese_talk.json vocab size is 4461 DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5 max sequence length in data is 20 read 240000 image features assigned 220000 images to split train assigned 10000 images to split val assigned 10000 images to split test Traceback (most recent call last): File "train.py", line 229, in train(opt) File "train.py", line 115, in train data = loader.get_batch('train') File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/dataloader.py", line 163, in get_batch data['att_feats'][iseq_per_img:(i+1)seq_per_img, :att_batch[i].shape[0]] = att_batch[i] ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7) Terminating BlobFetcher

fearless77 commented 6 years ago

I really need you help, thanks a lot!!!

ruotianluo commented 6 years ago

flatten into 49*4038

fearless77 commented 6 years ago

Sorry I can't understand the meaning.

ruotianluo commented 6 years ago

flatten the 7x7x2048 to 49x2048

fearless77 commented 6 years ago

具体是修改什么地方,哪些语句呢?麻烦了,非常感谢~

ruotianluo commented 6 years ago

https://github.com/ruotianluo/Image_Captioning_AI_Challenger/blob/master/dataloader.py#L189 这句下面加 att_feats = np.reshape(att_feats, (-1,att_feats.shape[-1]))

fearless77 commented 6 years ago

好的!!已经运行成功了,非常感谢!!!

fearless77 commented 6 years ago

我们现在已经训练好模型了,但是在运行eval.py的时候出现了下面的错误:

/usr/bin/python2.7 /home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/eval.py --model ./log_tp/model-best.pth --infos_path ./log_tp/infos_tp-best.pkl --image_folder images --num_images -1 DataLoaderRaw loading images from folder: images 0 listing all images in directory images DataLoaderRaw found 34 images Traceback (most recent call last): File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/eval.py", line 137, in vars(opt)) File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/eval_utils.py", line 106, in eval_split seq = model(fc_feats, att_feats, att_masks, opt=evalkwargs, mode='sample')[0].data File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/models/CaptionModel.py", line 31, in forward return getattr(self, ''+mode)(*args, kwargs) File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/models/AttModel.py", line 189, in _sample return self._sample_beam(fc_feats, att_feats, att_masks, opt) File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/models/AttModel.py", line 149, in _sample_beam att_feats = pack_wrapper(self.att_embed, att_feats, att_masks) File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/models/AttModel.py", line 33, in pack_wrapper return module(att_feats) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(input, kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/batchnorm.py", line 37, in forward self.training, self.momentum, self.eps) File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 1013, in batch_norm return f(input, weight, bias) RuntimeError: running_mean should contain 14 elements not 2048

Process finished with exit code 1

请问这是什么原因呢?是否是因为在训练的时候将 7x7x2048 变为 49x2048的缘故呢? 非常感谢!

ruotianluo commented 6 years ago

测试的时候也要是49x2048