KV cache offloading to CPU RAM

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

https://llm.mlc.ai/

Apache License 2.0

19.23k stars 1.58k forks source link

Open shahizat opened 4 days ago

shahizat commented 4 days ago

Hello MLC-LLM team,

I would appreciate it if you could implement KV cache offloading in the near future. Thanks in advance!