ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.2k stars 5.81k forks source link

[Core] Support ref counting for pg #34345

Open jjyao opened 1 year ago

jjyao commented 1 year ago

Description

Currently pg is cleaned up after the job finishes or via a manual remote_placement_group call. That means, unlike actor, it won't be freed after it goes out of scope.

This has caused pg leaks when people failed to manually call remote_placement_group: for example a task creates a pg and that task crashes.

paa1750 commented 1 year ago

@jjyao Could I work on this?