thunlp / PEVL

Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”
MIT License
46 stars 5 forks source link

Visual Relation Detection Reproducibility #14

Open willxxy opened 1 year ago

willxxy commented 1 year ago

Hello,

Thank you for the wonderful work.

I was trying to reproduce the results shown in paper for VRD.

Currently I am getting around the following scores: R@20: 0.5417 R@50: 0.6160 R@100: 0.6350 mR@20: 0.1220 mR@50: 0.1606 mR@100: 0.1723

I finetuned the pretrained checkpoint for 10 epochs with 8 v100 GPUs as instructed with a batch size of 8 and 32.

I also separately finetuned the pretrained checkpoint for 10 epochs with 10 100a GPUs with a batch size of 100.

Would you be kind of enough to provide some reasoning behind these results?

Thank you.

qyc-98 commented 1 year ago

Hi, we conducted second-stage pretraining (the same pretraining tasks as first-stage pretraining which are MLM, ITM, ITC,e.g.) on VRD with batch size 256 and then finetuned it for 5 epochs. And we took the best results from these 5 rounds based on the results of the validation set.

sethzhao506 commented 1 year ago

Hi, thanks for your comment! Is it possible for you to release the final finetuned checkpoint on VRD task? Thank you so much!