vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.78k stars 3.92k forks source link

[Misc]: How to access the KV cache directly? #4156

Open BDHU opened 5 months ago

BDHU commented 5 months ago

Anything you want to discuss about vllm.

I'm looking to conduct an experiment, which involves copying the contents of KV cache between nodes. I'm not super familiar with the codebase, is there any way to access the page table/KV cache directly? Where do I start? Any suggestions are helpful!

duanzhaol commented 4 months ago

Curios about this topic too, I want to implement a simple request transfer (including kv cache) between nodes. #2809 seems did it, but only support with infiniband, and has a dependency on MSCCL++.

BDHU commented 4 months ago

Any updates on this?

tanejaaryan commented 3 months ago

interested in this as well, can anyone guide a few first steps?

CSEEduanyu commented 1 month ago

just use cudaIPChandle and cudamemcopy