yale-sys / prompt-cache

Modular and structured prompt caching for low-latency LLM inference
MIT License
14 stars 1 forks source link

Speed up the initial caching process #6

Closed sarda-nikhil closed 8 months ago

sarda-nikhil commented 9 months ago

The caching process is very slow. This affects the cache "warm up" time. We should look into ways to speed up the caching process. Some proposals by In:

a) Use multiple GPUs for caching (can we use CPUs as well)? b) Look into batch caching