ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
MIT License
991 stars 278 forks source link

error on flickr30k eval #251

Closed xmu-xiaoma666 closed 3 years ago

xmu-xiaoma666 commented 3 years ago

log like bellow:

Traceback (most recent call last):
  File "tools/train.py", line 299, in <module>
    train(opt)
  File "tools/train.py", line 253, in train
    dp_model, lw_model.crit, loader, eval_kwargs)
  File "/home/mayiwei/extraSpace/TIP/self-critical.pytorch/captioning/utils/eval_utils.py", line 207, in eval_split
    predictions.pop()
IndexError: pop from empty list
ruotianluo commented 3 years ago

Not sure why this would happen.

xmu-xiaoma666 commented 3 years ago

Not sure why this would happen.

When I use Flickr30k to train the model, the performance of the second stage(self-critical or new-self-critical) will drop quickly. Do you know what this happens?

ruotianluo commented 3 years ago

It worked for me:

This was the script I ran.

xe: python train.py --id $id --caption_model att2in2 --input_json data/f30ktalk.json --input_label_h5 data/f30ktalk_label.h5 --input_fc_dir data/f30kbu_fc --input_att_dir data/f30kbu_att.pth --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 30

nsc: python train.py --id $id --caption_model att2in2 --input_json data/f30ktalk.json --input_label_h5 data/f30ktalk_label.h5 --input_fc_dir data/f30kbu_fc --input_att_dir data/f30kbu_att.pth --seq_per_img 5 --batch_size 100 --beam_size 1 --learning_rate 4.3e-5 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use -1 --max_epochs 100 --structure_after 28 --structure_sample_n 1 --structure_loss_weight 1 --structure_loss_type new_self_critical --cached_tokens f30k-train-idxs

ruotianluo commented 3 years ago

image

xmu-xiaoma666 commented 3 years ago

It worked for me:

This was the script I ran.

xe: python train.py --id $id --caption_model att2in2 --input_json data/f30ktalk.json --input_label_h5 data/f30ktalk_label.h5 --input_fc_dir data/f30kbu_fc --input_att_dir data/f30kbu_att.pth --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 30

nsc: python train.py --id $id --caption_model att2in2 --input_json data/f30ktalk.json --input_label_h5 data/f30ktalk_label.h5 --input_fc_dir data/f30kbu_fc --input_att_dir data/f30kbu_att.pth --seq_per_img 5 --batch_size 100 --beam_size 1 --learning_rate 4.3e-5 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use -1 --max_epochs 100 --structure_after 28 --structure_sample_n 1 --structure_loss_weight 1 --structure_loss_type new_self_critical --cached_tokens f30k-train-idxs

Thank you very much. I surprisingly find that the value of batch size affects the performance a lot. Under Flickr30k, I use a batch size of 10 in the second stage(sc, nsc), the performance drops quickly. However, if I use a batch size of 50, the performance will improve. Do you know why this happens? Or do you know which value of batch size works best in the second stage?

xmu-xiaoma666 commented 3 years ago

It worked for me:

This was the script I ran.

xe: python train.py --id $id --caption_model att2in2 --input_json data/f30ktalk.json --input_label_h5 data/f30ktalk_label.h5 --input_fc_dir data/f30kbu_fc --input_att_dir data/f30kbu_att.pth --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 30

nsc: python train.py --id $id --caption_model att2in2 --input_json data/f30ktalk.json --input_label_h5 data/f30ktalk_label.h5 --input_fc_dir data/f30kbu_fc --input_att_dir data/f30kbu_att.pth --seq_per_img 5 --batch_size 100 --beam_size 1 --learning_rate 4.3e-5 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use -1 --max_epochs 100 --structure_after 28 --structure_sample_n 1 --structure_loss_weight 1 --structure_loss_type new_self_critical --cached_tokens f30k-train-idxs

img1

This is my result. Either batch size of 50 or 10, the performance will still drop.(green:batch size of 10;grey:batch size of 50)