theodo-group / LLPhant

LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain
MIT License
761 stars 76 forks source link

Adding chunk overlapping to document splitting #138

Closed mikelmao closed 1 month ago

mikelmao commented 3 months ago

I propose adding a new parameter $overlap = 0 to the splitDocument() method in DocumentSplitter and making the function consider the specified number of characters in overlapping the previous chunk in to the current chunk.

This would allow for the chunks being created to overlap a specified amount of characters to optimize for RAG use cases

MaximeThoonsen commented 3 months ago

Hey @mikelmao, yes this is a behavior from Langchain. I agree. Do you want to contribute on this one?

synio-wesley commented 1 month ago

I am also missing this functionality. Looks like I would have to create it myself?