mbzuai-oryx / GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
https://mbzuai-oryx.github.io/GeoChat
356 stars 23 forks source link

metrics about region captioning #36

Open Hoteryoung opened 2 months ago

Hoteryoung commented 2 months ago

I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper. The official evaluation result: image My result: image Note that:

  1. I finetuned the model only the first stage, which means I finetuned the LLaVA-v1.5-7b using GeoChatInstruct for only one epoch, and I did not further fine-tune the model using only referring and grounding samples since the lack of details in the paper about the stage 2 fientune.
  2. I used the evaluate package of HuggingFace.

I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?

Davidup1 commented 2 months ago

@Hoteryoung I also met this problem and the metric fell even lower after stage2 finetune😂