ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs
Apache License 2.0
467 stars 69 forks source link

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

Open llsj14 opened 1 week ago

llsj14 commented 1 week ago

Hello,

I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by num_concurrent_requests have finished.

This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.

To address this, I believe the get_next_ready function should be modified as follows, enabling it to return results as soon as each individual request is completed:

--- a/src/llmperf/requests_launcher.py
+++ b/src/llmperf/requests_launcher.py
@@ -40,6 +40,7 @@ class RequestsLauncher:
         if not block:
             while self._llm_client_pool.has_next():
                 results.append(self._llm_client_pool.get_next_unordered())
+                return results
         else:
             while not self._llm_client_pool.has_next():
                 pass

I am prepared to submit a pull request with this change and would appreciate your feedback.

Thank you.

llsj14 commented 1 week ago

I found that the previous revision isn't sufficient to solve the problem. To send a request asynchronously right after the previous one finishes, many parts need fixing. I attempted to make the get_next_ready function asynchronous, but it depends on ray.util.get_next_unordered(). Converting it to an asynchronous function is challenging due to its current blocking implementation.

Here is the link to the relevant code: ray/util/actor_pool.py lines 311-326.

I think there are two potential approaches for change:

ashutoshsaboo commented 2 days ago

Hey @llsj14 I'm facing the same issue - without issuing concurrent requests at a set rate, it's no longer a proper load testing framework, do you have plans to fix this?