Open Raman1121 opened 7 months ago
同问
I solved this issue on my own. Here is a github gist: https://gist.github.com/Raman1121/aaec5a2a1315d78b527eb604dbc7e085
Currently, if you run this over a dataset, the inference is performed on 1 image at a time (which is quite slow). Looking for tips/ ideas to do this faster.
Congrats on the great work!
I am looking for an example/ tutorial as to how I can do VQA on a single image. For instance, I want to provide an x-ray to the model and a question "Describe the image in detail". How can I do this?