ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.23k stars 5.62k forks source link

[serve] `max_ongoing_requests` limited by `max_concurrency` in actor #47681

Open aRyBernAlTEglOTRO opened 1 week ago

aRyBernAlTEglOTRO commented 1 week ago

What happened + What you expected to happen

  1. The Bug: max_ongoing_requests params in @serve.deployment isn't useful when it larger than 1000.
  2. Expected Behavior: max_ongoing_requests is useful even it larger than 1000.

Versions / Dependencies

Reproduction script

Reproducible Script:

from ray import serve
from ray.serve.handle import DeploymentHandle
import asyncio

@serve.deployment(max_ongoing_requests=4096)
class Model:
    @serve.batch(max_batch_size=2048, batch_wait_timeout_s=2)
    async def __call__(self, ls: list[int]) -> list[int]:
        print(f"Length of input list: {len(ls)}")
        return ls

async def main() -> None:
    handle: DeploymentHandle = serve.run(Model.bind())
    await asyncio.gather(*[handle.remote(i) for i in range(2048)])

if __name__ == "__main__":
    asyncio.run(main())

Expect Output:

Length of input list: 2048

Actual Output:

Length of input list: 1000
Length of input list: 1000
Length of input list: 48

Issue Severity

Low: It annoys or frustrates me.

aRyBernAlTEglOTRO commented 1 week ago

I think the issue is caused by the limitation of max_concurrency in Actor, which default is 1000. A quick solution is to modify add the "max_concurrency" in allowed_ray_actor_options in following and script:

https://github.com/ray-project/ray/blob/1c80db59b131d3cd87ad36665105d4fa9f24e7f0/python/ray/serve/_private/config.py#L537

and modify the RayActorOptionsSchema in the following script to add the support for max_concurrency.

https://github.com/ray-project/ray/blob/1c80db59b131d3cd87ad36665105d4fa9f24e7f0/python/ray/serve/schema.py#L190

But I think a better way is to align the max_ongoing_requests in DeploymentConfig and max_concurrency in ray actor, because they seems like share the same intention, but it will need more code changes.