microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.34k stars 252 forks source link

[Question] How to optimize the cost of each Ask? #511

Closed joaolovatti closed 1 month ago

joaolovatti commented 1 month ago

Context / Scenario

• I have noticed that each request I make to the \ask endpoint is generating a cost of 2 cents using the OpenAI API.

• It has been using a large amount of tokens as context.

• Compared to smaller uses like asking on ChromaDB, I found that Kernel Memory is using a higher amount of tokens.

Question

• Do you have any tips on how to reduce the cost?

• Can I limit the amount of context tokens used for the LLM? Or limit the number of relevant partitions returned?