ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.95k stars 5.58k forks source link

[Core] Support ref counting for pg #34345

Open jjyao opened 1 year ago

jjyao commented 1 year ago

Description

Currently pg is cleaned up after the job finishes or via a manual remote_placement_group call. That means, unlike actor, it won't be freed after it goes out of scope.

This has caused pg leaks when people failed to manually call remote_placement_group: for example a task creates a pg and that task crashes.

paa1750 commented 11 months ago

@jjyao Could I work on this?