mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
257 stars 26 forks source link

Question about zero-shot results on general VQA benchmarks #6

Closed Zhudongsheng75 closed 11 months ago

Zhudongsheng75 commented 11 months ago

Thanks to the author for open source a very good model. I noticed that the paper was tested on many benchmarks, but I did not see the relevant code in github. Can you open source the relevant test code? Thank you so much.

gordonhu608 commented 11 months ago

Thanks for your interest in our work. We followed the same evaluation prompt as InsturctBLIP and utilized the evaluation code from this repo https://github.com/mlpc-ucsd/BLIVA/blob/b45425a7c87d01ecc075d86c9f2376689a1c80db/README.md?plain=1#L130

Zhudongsheng75 commented 11 months ago

Thanks for your interest in our work. We followed the same evaluation prompt as InsturctBLIP and utilized the evaluation code from this repo

https://github.com/mlpc-ucsd/BLIVA/blob/b45425a7c87d01ecc075d86c9f2376689a1c80db/README.md?plain=1#L130

Thank you for your help. I have checked this repo and found that this repo mainly involves OCR related tests. Is there any code I can refer to for testing benchmarks such as visdual dialog and Flickr30K?

gordonhu608 commented 11 months ago

For Visual dialog, we processed the prompts in this way https://github.com/mlpc-ucsd/BLIVA/blob/b45425a7c87d01ecc075d86c9f2376689a1c80db/bliva/models/bliva_vicuna7b.py#L519-L522 As for Flickr30K, we simply asked the model to give "A short image description".

ywh187 commented 3 months ago

Hello, We evaluated BLIVA according to the code of the MultimodalOCR repository, and the results are still different from those in the paper, do I need to make any changes? Thank you

gordonhu608 commented 3 months ago

Thanks for your interests in our work. It seems like the MultimodalOCR repository has many updates. Please refer to its version published on August or July 2023, which may resolve your issues.