Inference script for visual question and answering task

illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.

MIT License

1.2k stars 106 forks source link

I'm trying to give an input image and a prompt/question related to the image, and I want to get the relevant answer from ColPali model. But I'm not able to implement this . When I try

model.generate(**model_inputs, max_length=2048)

it's showing the following error

TypeError: The current model class (ColPali) is not compatible with .generate(), as it doesn't have a language model head. Please use one of the following classes instead: {'PaliGemmaForConditionalGeneration'}

Could you please share some example code for visual question answering task using ColPali?

illuin-tech / colpali

Inference script for visual question and answering task #10