Open BDHU opened 5 months ago
Curios about this topic too, I want to implement a simple request transfer (including kv cache) between nodes. #2809 seems did it, but only support with infiniband, and has a dependency on MSCCL++.
Any updates on this?
interested in this as well, can anyone guide a few first steps?
just use cudaIPChandle and cudamemcopy
Anything you want to discuss about vllm.
I'm looking to conduct an experiment, which involves copying the contents of KV cache between nodes. I'm not super familiar with the codebase, is there any way to access the page table/KV cache directly? Where do I start? Any suggestions are helpful!