ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.71k stars 5.73k forks source link

[data] Eagerly free intermediate values during shuffle #42145

Open stephanie-wang opened 10 months ago

stephanie-wang commented 10 months ago

Description

Shuffle ops generate additional intermediate values between the map and reduce stages. We should eagerly free these with ray._private.internal_api.free as we do for other ops. This is likely to improve performance by reducing the amount of data spilled.

Use case

No response

dgdheeraj commented 7 months ago

Hi @stephanie-wang @scottjlee , Is this issue assigned to anyone? I'm interested in contributing to Ray and would like to take this up.

prithvi081099 commented 3 months ago

Hi @stephanie-wang, I am new to ray and I want to contribute to ray. @stephanie-wang can you please help me with which other ops uses ray._private.internal_api.free(this will help me understand how this variable is used), also when you say shuffle ops can you please point toward the code that is used by map reduce for shuffle? Thank you so much.

scottjlee commented 3 months ago

@prithvi081099

prithvi081099 commented 3 months ago

@scottjlee, thank you for responding so quickly. I will have a look at this file. And will make changes accordingly and create a PR. I will reach out if I get stuck anywhere.