Open rmusser01 opened 2 months ago
Look at implementation of OP-RAG https://arxiv.org/html/2409.01666v1
Steps to Implement OP-RAG:
Document Preprocessing:
Divide the long document into fixed-size chunks of 128 tokens each.
Embedding Generation:
Use a pre-trained model like BGE-large-en-v1.5 to generate embeddings for both the query and the text chunks.
Similarity Calculation:
Calculate the cosine similarity between the query and each chunk's embedding to determine relevance scores.
Order Preservation:
Retrieve the top k chunks based on similarity scores, but preserve the original order of the chunks as they appeared in the document.
Token Management:
Limit the number of tokens retrieved for context to a manageable size (e.g., 16K, 48K, etc.), depending on the model's capacity.
Feed into Generator:
Input the ordered, relevant chunks into the language model (e.g., Llama3.1-70B) for answer generation.
Evaluation:
Evaluate the quality of the generated answers using metrics such as F1 score or accuracy.
By following these steps, you can implement the OP-RAG method and potentially replicate the results they presented in the paper. You can also experiment with different chunk sizes, retrieval methods, and context lengths to optimize for your specific application.
General RAG Research https://github.com/Ancientshi/ERM4 https://github.com/QingFei1/LongRAG
Issue is to track efforts towards implementing/Improving RAG implementation.