In mature RAG frameworks, providing custom chunking methods is a very common choice. Fixed-length splitting brings many disadvantages, such as causing incomplete grammatical inputs, which challenges small models to produce normal outputs. Therefore, we have added an optional feature, allowing users to customize the chunking method, just like they can provide custom LLMs and embedding functions. At the same time, we have provided a separator-based splitting method that can ensure each chunk is grammatically complete.
In mature RAG frameworks, providing custom chunking methods is a very common choice. Fixed-length splitting brings many disadvantages, such as causing incomplete grammatical inputs, which challenges small models to produce normal outputs. Therefore, we have added an optional feature, allowing users to customize the chunking method, just like they can provide custom LLMs and embedding functions. At the same time, we have provided a separator-based splitting method that can ensure each chunk is grammatically complete.