mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.23k stars 1.58k forks source link

KV cache offloading to CPU RAM #3033

Open shahizat opened 4 days ago

shahizat commented 4 days ago

Hello MLC-LLM team,

I would appreciate it if you could implement KV cache offloading in the near future. Thanks in advance!