mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
253 stars 27 forks source link

Inquire if image captioning tasks are supported #15

Open shams2023 opened 8 months ago

shams2023 commented 8 months ago

Excuse me, author! May I ask if this supports image captioning tasks, especially for generating text descriptions of fine-grained images? Thank you! Looking forward to your reply!

gordonhu608 commented 8 months ago

Thank you for your interest in our work. We support generating text descriptions. For this task, you can prompt BLIVA to describe the image in detail.

shams2023 commented 8 months ago

Thank you for your interest in our work. We support generating text descriptions. For this task, you can prompt BLIVA to describe the image in detail.

Thank you very much for your reply. How can I use your code to implement image captions? May I ask if you can be more specific? Thank you!

gordonhu608 commented 8 months ago

Specifically, first setup BLIVA according to the readme. Then run the evaluate.py using the option --answer_qs \ --model_name bliva_vicuna \ --img_path specify your image here \ --question "Describe the image in detail."

shams2023 commented 8 months ago

具体来说,首先根据自述文件设置 BLIVA。然后使用选项 --answer_qs --model_name bliva_vicuna \ --img_path在此处指定图像来运行 evaluate.py --question “详细描述图像”。

Thank you very much!