Awesome project, I was really excited to see that you have support for the llama python package.
Small suggestion, you should consider using the LlamaCache which adds an in-memory cache that significantly reduces the amount of token re-processing. The change is minimal but requires setting an appropriate cache size in bytes (likely a user option, default is 2Gb).
Awesome project, I was really excited to see that you have support for the llama python package.
Small suggestion, you should consider using the
LlamaCache
which adds an in-memory cache that significantly reduces the amount of token re-processing. The change is minimal but requires setting an appropriate cache size in bytes (likely a user option, default is 2Gb).