RAG Performance Improvements

The RAG performance of this application can be improved by using more advanced techniques for retrieval and synthesis.

Currently, the app uses the SentenceSplitter node parser to parse text into chunks for each sentence. Then, it uses the same chunk for both context retrieval and response synthesis when querying the LLM. It is not optimal to use the same chunk size for retrieval and synthesis, as a smaller chunk size helps embedding-based retrieval to find more relevant context, while a larger chunk size helps the LLM to synthesize a better response.

Implementing a technique to use a smaller chunk size for retrieval and a larger chunk size for synthesis can help improve RAG performance. Two of these techniques include:

Sentence-Window Retrieval
- Text is parsed into sentence nodes that include metadata containing additional context of surrounding sentences.
- Relevant sentences are retrieved, then replaced with the larger window of surrounding context before being used to synthesize a response.
Auto-Merging Retrieval
- Text is parsed into hierarchical nodes (i.e., parent, child).
- Child nodes are smaller in size and used for retrieval.
- If enough child nodes from the same parent node are retrieved, they are merged into their larger parent node which is then used for synthesis.
- The parent nodes may contain additional child nodes not originally retrieved. This provides additional context to the LLM during synthesis.

run-llama / sec-insights

RAG Performance Improvements #96