microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.25k stars 2.45k forks source link

Error may arise when finetuning Beit3 on COCO captioning #1086

Open ImKeTT opened 1 year ago

ImKeTT commented 1 year ago

Thanks for providing such a concise and clean code for beit3. There may be a typo/error in the datasets.py here: https://github.com/microsoft/unilm/blob/9102ed91f8e56baa31d7ae7e09e0ec98e77d779c/beit3/datasets.py#L847 I think the is_train flag should be false for the validation set, which may influnce some behaviours of the dataloader such as the drop_last flag here: https://github.com/microsoft/unilm/blob/9102ed91f8e56baa31d7ae7e09e0ec98e77d779c/beit3/datasets.py#L733

On COCO captioning task, since you're using COCOEval to evaluate generated captions during training, you have to make sure the image_ids of generated captions are exactly the same with the image_ids of the ground truth labels. If drop_last=True is set in the val dataloader, there's a chance (wrong eval_batch_size) to come across an error when dropping some validation instances...

My full command for reproducing this error is:

python -m torch.distributed.launch --nproc_per_node=8 run_beit3_finetuning.py \
                         --model beit3_base_patch16_480 --input_size 480 \
                         --task coco_captioning --batch_size 256 --eval_batch_size 16 --num_max_bpe_tokens 32 \
                         --sentencepiece_model my/path/to/beit3.spm \
                         --finetune my/path/to/beit3_base_indomain_patch16_480_vqa.pth \
                         --finetune mu/path/to/beit3_base_indomain_patch16_480_coco_captioning.pth \
                         --data_path my/path/to/mscoco \
                         --captioning_mask_prob 0.7 \
                         --drop_worst_after 12000 \
                         --dist_eval
wenhui0924 commented 1 year ago

Thanks!

linhuixiao commented 6 months ago

@ImKeTT Thank you for your solution! I also encountered exactly the same problem in the process of reproduction in COCO captioning task. According to your method, is_train=True for val split was changed to is_train=False to solve this bug.