metrics about region captioning

Hoteryoung commented 7 months ago

I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper. The official evaluation result: My result: Note that:

I finetuned the model only the first stage, which means I finetuned the LLaVA-v1.5-7b using GeoChatInstruct for only one epoch, and I did not further fine-tune the model using only referring and grounding samples since the lack of details in the paper about the stage 2 fientune.
I used the evaluate package of HuggingFace.

I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?

Davidup1 commented 6 months ago

@Hoteryoung I also met this problem and the metric fell even lower after stage2 finetune😂

Oreouo commented 3 months ago

Could you show me the code for calculating the metrics of your referring?

mbzuai-oryx / GeoChat

metrics about region captioning #36