infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22.47k stars 2.2k forks source link

[Question]: How to make overlapping chunking? #2522

Open edoserbia opened 1 month ago

edoserbia commented 1 month ago

Describe your problem

How to make overlapping chunking for docx, pdf or txt? The chunking methods seemed to split contents with no overlapping. This way, if the answer appears in the middle of two chunks, it can be very easy to fail to retrieve the complete answer.

KevinHuSh commented 1 month ago

Please try RAPTOR which is much better than overlapping chunking.