truefoundry / cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
https://cognita.truefoundry.com
Apache License 2.0
3.32k stars 274 forks source link

Feat/optimize model gateway #398

Closed mnvsk97 closed 2 weeks ago

mnvsk97 commented 2 weeks ago
  1. Avoid creating a model instance for every request and instead cache by model name, config, and other metadata.
  2. Add simple local dict based cache for embedding, llm, reranker, and audio models.
  3. Always check in cache before creating an instance of a model to support 1.
  4. Add cachetools library to support simple caching mechanisms and also for complex cases in the future.
  5. Add documentation for each method in the file