shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
MIT License
244 stars 22 forks source link

Reproducing MiniGPT-4's POPE result #8

Closed Ocean-627 closed 6 months ago

Ocean-627 commented 6 months ago

Hi, authors! Excellent work! I'm curious, how can I reproduce MiniGPT-4's POPE result? I have executed the provided script but the results seems to be inconsistent with the results reported in the Table 4.

shikiw commented 6 months ago

Hi, thanks for your appreciation!

Could you tell me which result is inconsistent? Is only MiniGPT-4's result inconsistent?

Ocean-627 commented 6 months ago

Yes, llava-1.5 and instructblip's results are consistent.

shikiw commented 6 months ago

Since POPE simply requires the model to answer yes/no and calculates the score, you may need to set max_new_tokens=10 in mode.generate() for efficiency. Then you can run the command:

python pope_eval.py --model MODEL_NAME --data-path /path/to/COCO_val_2014 --pope-type random --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1 --batch_size 20

I run the code on MiniGPT-4 7B model again to obtain the results as below. Random:

TP      FP      TN      FN
1031    119     1291    469
Accuracy: 0.797938144329897
Precision: 0.8965217391304348
Recall: 0.6873333333333334
F1 score: 0.7781132075471698
Yes ratio: 0.3951890034364261

Popular:

TP      FP      TN      FN
1035    328     1172    465
Accuracy: 0.7356666666666667
Precision: 0.7593543653705063
Recall: 0.69
F1 score: 0.723017813482361
Yes ratio: 0.4543333333333333

Adversarial:

TP      FP      TN      FN
1034    385     1115    466
Accuracy: 0.7163333333333334
Precision: 0.7286821705426356
Recall: 0.6893333333333334
F1 score: 0.7084618019869818
Yes ratio: 0.473

The average F1 score is around 73.6, which is consistent with the result reported in Table 4 of our paper. I am wondering if you can get the similar results following the instructions listed above? I hope this helps :)

Ocean-627 commented 6 months ago

Thank you very much for your careful guidance! I'll try it soon.