I am confused why _run_workers_async function of DistributedGPUExecutorAsync is removed since v0.4.3?
New implementation starts a loop for every worker which will restrict worker from doing other things such as transfering kv cache in prefill/decode disaggregation. I use _run_workers_async to transfer kv cache before without any problems but it will execute only when the loops of workers are stopped currently.
I am sorry that I am not familiar with asyncio in python. I want to know what the benefits of the new implementation are? And how to allow the workers to transfer kv asynchronously during generation?
I am confused why _run_workers_async function of DistributedGPUExecutorAsync is removed since v0.4.3?
New implementation starts a loop for every worker which will restrict worker from doing other things such as transfering kv cache in prefill/decode disaggregation. I use _run_workers_async to transfer kv cache before without any problems but it will execute only when the loops of workers are stopped currently.
I am sorry that I am not familiar with asyncio in python. I want to know what the benefits of the new implementation are? And how to allow the workers to transfer kv asynchronously during generation?