Open istvan-deak opened 2 months ago
If you want to use whole files in indexing, then just don't use splitter and make sure parser doesn't split documents (e.g. use 'mode=single' in ParseUnstructured).
Doing exactly what you want, that is indexing over small chunks, but retrieving whole documents is not easily supported, what you can do is write your own splitter that inserts full documents text in the metadata of each chunk, and then after chukns are retrieved rather then using returned text, use the full document text from metadata.
@szymondudycz I believe this question has come up a number of times already. Perhaps we should make it into a feature request? The resolution could be e.g. a code template that shows how to have a table of full_document_metadata, a table of chunks with document_id in their metadata, and shows how to retrieve full_document_metadata for a given chunk, and maybe also load/reread the document on demand (with a udf). @istvan-deak if you have any thoughts here, please don't hesitate to share.
What is your question or problem? Please describe.
I would like to use the long context window of the LLM of my choice and pass whole files to the prompt.
Describe what you would like to happen
During retrieval, I'd like the system to:
This approach would allow for more context to be provided to the LLM, potentially improving its performance on tasks that require broader context.