Reproducing MeaCap-TF results on MS COCO dataset

thechargedneutron commented 2 weeks ago

Hi,

Thanks for the good work. I am trying to reproduce the numbers reported in the paper (Table 1, MeaCap-TF). The paper mentions a CIDEr score of 42.5 on the training free variant. I use the command python inference.py --use_prompt --memory_id cc3m --img_path ./image_example --lm_model_path ./checkpoints/CBART_one_billion to find the MS-COCO captions and use pycocoeval package to find the language metrics. Here are the numbers that I got

SPICE: 0.094
Bleu_4: 0.045
METEOR: 0.141
ROUGE_L: 0.264
CIDEr: 0.260

which seems lower than the numbers in the paper. Can you point me to the evaluation code in the codebase? I am using pycocoeval and not sure if that is the reason for a lower performance. Or let me know if I am missing something.

Thanks

joeyz0z commented 1 week ago

The training-free version is sensitive to prompts. You can use --prompt_ensembling.

thechargedneutron commented 1 week ago

Thanks, I will try that. Do you have an evaluation code to check the performance for the generated captions? I do not see the eval code in the repo.

thechargedneutron commented 1 week ago

I tried --prompt_ensembling and I get similar performance as I reported above. Do you have the generations for this training-free variant and also, the evaluation code, if possible? Thanks!

joeyz0z / MeaCap

Reproducing MeaCap-TF results on MS COCO dataset #9