RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
1.34k
stars
252
forks
source link
[Question] How to optimize the cost of each Ask? #511
Closed
joaolovatti closed 1 month ago
Context / Scenario
• I have noticed that each request I make to the \ask endpoint is generating a cost of 2 cents using the OpenAI API.
• It has been using a large amount of tokens as context.
• Compared to smaller uses like asking on ChromaDB, I found that Kernel Memory is using a higher amount of tokens.
Question
• Do you have any tips on how to reduce the cost?
• Can I limit the amount of context tokens used for the LLM? Or limit the number of relevant partitions returned?