microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.58k stars 201 forks source link

How to run VQA inference on a single image? #59

Open Raman1121 opened 7 months ago

Raman1121 commented 7 months ago

Congrats on the great work!

I am looking for an example/ tutorial as to how I can do VQA on a single image. For instance, I want to provide an x-ray to the model and a question "Describe the image in detail". How can I do this?

7sunday commented 7 months ago

同问

Raman1121 commented 7 months ago

I solved this issue on my own. Here is a github gist: https://gist.github.com/Raman1121/aaec5a2a1315d78b527eb604dbc7e085

Currently, if you run this over a dataset, the inference is performed on 1 image at a time (which is quite slow). Looking for tips/ ideas to do this faster.