Open llsj14 opened 1 week ago
I found that the previous revision isn't sufficient to solve the problem. To send a request asynchronously right after the previous one finishes, many parts need fixing. I attempted to make the get_next_ready
function asynchronous, but it depends on ray.util.get_next_unordered()
. Converting it to an asynchronous function is challenging due to its current blocking implementation.
Here is the link to the relevant code: ray/util/actor_pool.py lines 311-326.
I think there are two potential approaches for change:
get_next_ready
function into an asynchronous function.get_next_ready
implementation.Hey @llsj14 I'm facing the same issue - without issuing concurrent requests at a set rate, it's no longer a proper load testing framework, do you have plans to fix this?
Hello,
I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by
num_concurrent_requests
have finished.This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.
To address this, I believe the
get_next_ready
function should be modified as follows, enabling it to return results as soon as each individual request is completed:I am prepared to submit a pull request with this change and would appreciate your feedback.
Thank you.