Open ImKeTT opened 1 year ago
Thanks!
@ImKeTT Thank you for your solution! I also encountered exactly the same problem in the process of reproduction in COCO captioning task. According to your method, is_train=True
for val split was changed to is_train=False
to solve this bug.
Thanks for providing such a concise and clean code for beit3. There may be a typo/error in the
datasets.py
here: https://github.com/microsoft/unilm/blob/9102ed91f8e56baa31d7ae7e09e0ec98e77d779c/beit3/datasets.py#L847 I think theis_train
flag should be false for the validation set, which may influnce some behaviours of the dataloader such as thedrop_last
flag here: https://github.com/microsoft/unilm/blob/9102ed91f8e56baa31d7ae7e09e0ec98e77d779c/beit3/datasets.py#L733On COCO captioning task, since you're using COCOEval to evaluate generated captions during training, you have to make sure the
image_id
s of generated captions are exactly the same with theimage_id
s of the ground truth labels. Ifdrop_last=True
is set in the val dataloader, there's a chance (wrongeval_batch_size
) to come across an error when dropping some validation instances...My full command for reproducing this error is: