Open stephanie-wang opened 10 months ago
Hi @stephanie-wang @scottjlee , Is this issue assigned to anyone? I'm interested in contributing to Ray and would like to take this up.
Hi @stephanie-wang, I am new to ray and I want to contribute to ray. @stephanie-wang can you please help me with which other ops uses ray._private.internal_api.free(this will help me understand how this variable is used), also when you say shuffle ops can you please point toward the code that is used by map reduce for shuffle? Thank you so much.
@prithvi081099
ray._private.internal_api.free
is used in Ray Data via the trace_deallocation()
utility. You can look at data/_internal/split.py
for an exampledata/_internal/planner/exchange/shuffle_task_spec.py
data/_internal/planner/exchange/pull_based_shuffle_task_scheduler.py
data/_internal/planner/exchange/push_based_shuffle_task_scheduler.py
@scottjlee, thank you for responding so quickly. I will have a look at this file. And will make changes accordingly and create a PR. I will reach out if I get stuck anywhere.
Description
Shuffle ops generate additional intermediate values between the map and reduce stages. We should eagerly free these with
ray._private.internal_api.free
as we do for other ops. This is likely to improve performance by reducing the amount of data spilled.Use case
No response