microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.28k stars 2.56k forks source link

BEIT3 cant get the performance of retrieval task which reported from the paper #1382

Open zhouruikun opened 11 months ago

zhouruikun commented 11 months ago

Describe Model I am using (BEIT3):

the command I used : python -m torch.distributed.launch --nproc_per_node=1 run_beit3_finetuning.py --model beit3_large_patch16_224 --input_size 224 --task flickr30k --batch_size 16 --sentencepiece_model checkpoints/beit3.spm --finetune checkpoints/beit3_large_itc_patch16_224.pth --data_path datas/flickr30k --eval

I get result : Eval result = {"tr_r10": 99.60000514984131, "tr_r5": 98.90000224113464, "tr_r1": 90.70000648498535, "ir_r10": 96.84000015258789, "ir_r5": 94.11999583244324, "ir_r1": 77.97999978065491, "average_score": 93.02333196004231} Accuracy of the network on the 5000 test images: 93.023%

and same problems when I try to evaluate the performance of retrieval task of fine-tuned checkpoints on coco and flicker30k dataset,
does anything I mistaked?

linhuixiao commented 10 months ago

Hi, I encountered the same issue when I downloaded the checkpoint from Beit-3's GitHub repository, but I couldn't achieve the performance mentioned in the paper. Both the retrieval task and captioning task exhibited this problem. In fact, the captioning task performed even 3.0 points worse than reported in the paper. I have serious doubts about the authenticity of Beit-3's experimental results.

wenhui0924 commented 10 months ago

Hi @zhouruikun,

We release base and large models for efficient usage. In our paper, we report the performance of a giant model. The base and large models we released, although achieving very good results, still fall short of the results of the giant model reported in the paper due to differences in model size.

linhuixiao commented 10 months ago

@wenhui0924 Thank you for taking time out of your busy schedule to answer our questions.

May I ask whether the giant model you mentioned will be open sourced?

linhuixiao commented 10 months ago

@wenhui0924 Can you release the pre-trained gaint model? After all, the performance in the Beit-3 paper is very strong, which is based on the gaint model, while the worse base and large models make it difficult for the following researchers to achieve the performance of SOTA, so that it is difficult to follow the Beit-3.

zhouruikun commented 10 months ago

@wenhui0924 Can you release the pre-trained gaint model? After all, the performance in the Beit-3 paper is very strong, which is based on the gaint model, while the worse base and large models make it difficult for the following researchers to achieve the performance of SOTA, so that it is difficult to follow the Beit-3.

If we can't get giant model and checkpoint, maybe only use the performance report from GitHub as the SOTA.