Feat/optimize model gateway

truefoundry / cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

https://cognita.truefoundry.com

Apache License 2.0

3.32k stars 274 forks source link

Closed mnvsk97 closed 2 weeks ago

mnvsk97 commented 2 weeks ago

Avoid creating a model instance for every request and instead cache by model name, config, and other metadata.
Add simple local dict based cache for embedding, llm, reranker, and audio models.
Always check in cache before creating an instance of a model to support 1.
Add cachetools library to support simple caching mechanisms and also for complex cases in the future.
Add documentation for each method in the file