Support Indexify as Retriever

diptanu commented 7 months ago

Hi folks! Love Verba, does the project support or plan to support pluggable retrievers? We are building an open-source reliable extraction and embedding engine - https://getindexify.ai We are pan on supporting Weviate as a storage backend very soon.

Indexify has a retriever API which supports retrieving using semantic search on embedding indexes, and structured data from unstructured data using SQL.

If we integrate Indexify, Verba will be able to -

Answer questions from not only PDF and documents, but also from images, videos and audio.
Ingest any amount of documents, videos, audio, etc at any scale (throughput, data volume)
Extraction of embedding, structured data from videos, docs, images will be offloaded in workers (distributed in production) so retrieval will always return fresh data.
Users can monitor the state of indexes, extraction status, delete or update ingested content and extracted embedding/metadata.
Support all major hardware accelerators and any model for extraction.

Here is an example pipeline for PDF extraction - https://getindexify.ai/usecases/pdf_extraction/ and for videos - https://getindexify.ai/usecases/video_rag/

I think the integration could be fairly seamless with some extensions in Verba and once we support Weviate in Indexify(should be straight forward).

Thoughts?

thomashacker commented 7 months ago

Interesting and great idea! I'll have a look 🚀

diptanu commented 7 months ago

@thomashacker I would love to chat more on discord, or on a call also! My email - diptanu@tensorlake.ai :)

thomashacker commented 2 months ago

Hey @diptanu if this is still open, feel free to create a PR

weaviate / Verba

Support Indexify as Retriever #135