whyhow-ai / knowledge-table

Knowledge Table is an open-source package designed to simplify extracting and exploring structured data from unstructured documents.
MIT License
196 stars 25 forks source link

Multimodal functionality with ColPali (byaldi) #9

Open tomsmoker opened 1 week ago

tomsmoker commented 1 week ago

What

Include ColPali for multimodal extraction from PDFs, so questions can be asked of more than just text.

https://github.com/AnswerDotAI/byaldi

Why

There is contextually relevant information in different modalities that can enrich the question space.

Implementation guidance

This would affect all files within the services directory, as well as their associated routers and models. We need to add the option for a multimodal model, as well as probably connecting to a local instance. This should be an optional install in the virtual environment.