Closed pipul closed 2 months ago
Hi, thank you for your comment!
Whenever the token generation worker needs a new prompt:
fetch
function.Next, the way that receive is happening in the token generation worker depends on whether we use MPI RDMA or BOOST send-recv for communication:
MPI_Put
to write the KV cache to the CPU memory of the the token generation worker. In that case, the token generation worker will just copy the KV cache from its CPU to GPU memoryI hope that answers your question!
in the prefill phase,when each layer finish, ParallelGptContextDecoder.cc copy_kv_cache_ubatch_layer() function will call steam_out send the layer kv cache to the decode instance
but i can't see any code for how to recv the kv cache from the prefill instance ??