Open wjj19950828 opened 1 month ago
No. And yes, @KuntaiDu can add more.
@KuntaiDu Do you have any suggestions on Prefix Caching offload to CPU? Thanks~
Yes, we have put some thought on supporting CPU/disk/database KV cache offloading. I am busy with profiling vllm's performance bottleneck recently (#6794 ), but I will circle back to KV cache offloading in next week.
Usage
Does Prefix Caching currently support offloading to the CPU?
If not, is there a plan to support it? Thanks~