shikras / shikra

Other
734 stars 46 forks source link

A question about the result in Table 6 #2

Open Richar-Du opened 1 year ago

Richar-Du commented 1 year ago

Thanks for your awesome work! Shikras opens a way to effectively represent the coordinates in the image.

I have a question about the result in Table 6: the performance of Shikra on OK-VQA dataset is quite surprising, do you fine-tune Shikra on OK-VQA or does instruction-tuning data include OK-VQA?

HenryHZY commented 1 year ago

Same question. Just check Table 8 (no OK-VQA training data), I think the performance can be attributed to:

  1. QA-style
  2. COCO-world image
  3. High-quality

However, similar training data construction actually exists in the community. @zzhanghub @kq-chen Do you have any other intuitive ideas about this question?

BTW, I think the ablation study of the training data is also important. Thx.

Richar-Du commented 1 year ago

Could the authors please answer this question :) @zzhanghub @kq-chen