Open RobertHH-IS opened 4 months ago
Does this address your issue? As long as each document you define stays under the chunk size, GraphRAG will avoid splitting it. https://github.com/microsoft/graphrag/issues/396#issuecomment-2249127128
Does this address your issue? As long as each document you define stays under the chunk size, GraphRAG will avoid splitting it. https://github.com/microsoft/graphrag/issues/396#issuecomment-2249127128
A bit hacky but a way to proceed - thanks! A delim option would prevent the need for 600.000 files though :-)
Great, thanks - I'll queue this up, but good to hear you have a path forward in the meantime.
Is your feature request related to a problem? Please describe.
A big part of good rag is the quality of the input data. I would want to specifically prepare chunks with text and metadata for the graph extraction. A simple "delim" splitter would be a great addition opposed to the much more random character or token chunker.
Describe the solution you'd like
Allow us to specifiy delim in the chunks settings.yaml. If it is specified, it will not do any chunking, simply split at the delim and proceed from there.
Additional context
No response