Open thechargedneutron opened 2 weeks ago
The training-free version is sensitive to prompts. You can use --prompt_ensembling.
Thanks, I will try that. Do you have an evaluation code to check the performance for the generated captions? I do not see the eval code in the repo.
I tried --prompt_ensembling
and I get similar performance as I reported above. Do you have the generations for this training-free variant and also, the evaluation code, if possible? Thanks!
Hi,
Thanks for the good work. I am trying to reproduce the numbers reported in the paper (Table 1, MeaCap-TF). The paper mentions a CIDEr score of 42.5 on the training free variant. I use the command
python inference.py --use_prompt --memory_id cc3m --img_path ./image_example --lm_model_path ./checkpoints/CBART_one_billion
to find the MS-COCO captions and use pycocoeval package to find the language metrics. Here are the numbers that I gotwhich seems lower than the numbers in the paper. Can you point me to the evaluation code in the codebase? I am using pycocoeval and not sure if that is the reason for a lower performance. Or let me know if I am missing something.
Thanks