ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.13k stars 5.79k forks source link

[Data] Global `concurrency` limit #48198

Open dominikgrygiel opened 1 month ago

dominikgrygiel commented 1 month ago

Description

I would like to be able to set a "global" (within the current DataContext) limit of allowed concurrency to limit resource usage.

Some of the Ray Data APIs support the concurrency parameter (ex. map_batches, flat_map, map_groups, etc.), but others don't (ex. sort). Based on the documentation (https://docs.ray.io/en/latest/data/performance-tips.html#configuring-resources-and-locality), I was expecting that setting ctx.execution_options.resource_limits.cpu would have the same effect as providing concurrency, but this doesn't seem to be the case. Moreover, some operations call other operations (ex. I believe that groupby would call sort), without any ability to provide such concurrency limits. Therefore, Ray would end up scheduling a task for every batch at the same time, leading to resource exhaustion.

Use case

The use case is very simple - prevent resource exhaustion when running certain Ray Data operations that currently don't support the concurrency limit.

jcotant1 commented 2 weeks ago

Hey @gvspraveen @richardliaw any thoughts on this one? Thx