illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.
https://huggingface.co/vidore
MIT License
1.2k stars 106 forks source link

Inference script for visual question and answering task #10

Closed poojitha892 closed 4 months ago

poojitha892 commented 4 months ago

I'm trying to give an input image and a prompt/question related to the image, and I want to get the relevant answer from ColPali model. But I'm not able to implement this . When I try

model.generate(**model_inputs, max_length=2048)

TypeError: The current model class (ColPali) is not compatible with .generate(), as it doesn't have a language model head. Please use one of the following classes instead: {'PaliGemmaForConditionalGeneration'}

Could you please share some example code for visual question answering task using ColPali?

galtay-tempus commented 4 months ago

@poojitha892 ColPali is a model for retrieval not for generating answers. You can use a model like gemini or gpt-4o for taking in retrieved images and answering questions.