infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
16.45k stars 1.68k forks source link

Integrate with Indexify #470

Open diptanu opened 4 months ago

diptanu commented 4 months ago

This is an amazing project, and the document extraction model works really well. I would love to propose an integration between RAGFlow and Indexify - https://getindexify.ai

Indexify is an Apache 2.0 licensed open source compute engine and data framework for unstructured data. It provides reliable extraction at any scale, can run on laptops to 100s of machines in production. We have a pluggable extractor framework that can be used to build new or any custom extractors as well. I would love to explore if we can integrate RagFlow with Indexify.

The benefits for RAGFlow would be -

Path to Integration - From what I have seen, the chunking logic, the model used for extraction can easily be packaged into extractors and it should be straight forward to connect the UI with Indexify's API for uploading and retrieving as well :)

Here are some relevant links -

PDF Extraction and RAG - https://getindexify.ai/usecases/pdf_extraction/ Video Extraction and RAG - https://getindexify.ai/usecases/video_rag/ The extractor SDK - https://github.com/tensorlakeai/indexify-extractors/tree/main/extractor-sdk

I would love to hear your thoughts!

ChildWangWorld commented 4 months ago

amazing~