illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.
https://huggingface.co/vidore
MIT License
851 stars 75 forks source link

OCR with colpali #68

Closed lukiod closed 3 weeks ago

lukiod commented 3 weeks ago

Is it possible for a model successfully extracts text from the image and returns the extracted text in a structured format (JSON or plain text) using colpali.

ManuelFay commented 3 weeks ago

Hello ! That 's kind of the opposite of the point of ColPali... But most VLMs nowadays can definitely do that, so you can combine colpali for retrieving the page you want and a VLM to do justtaht !