Open JiahaoYao opened 2 years ago
@richardliaw could you help triaging this one? I wasn't able to figure out whether it's a Core issue or Tune issue
Believe this is a core issue.
@rkooo567 can you add context why this is a release blocker?
We should attempt reproduction once Alex's autoscaler changes are made. that will tell us if it's autoscaler or in core.
I think this is the real core leak issue. but no one could find the repro so far...
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
What happened + What you expected to happen
The placement bundles in the ray tune does not disappear after the program shuts down. There might be something wrong with the recalculations of the resources. @matthewdeng @rkooo567 @scv119
the
ray.status
output has(0 used of 0.0 reserved in placement groups)
remained.sometimes the outputs can be negative
this is the outputs from https://github.com/ray-project/ray/blob/master/python/ray/scripts/scripts.py and has
bundles
not eliminated.Versions / Dependencies
ray nightly
Reproduction script
I was using alpa + ray tune (https://github.com/alpa-projects/alpa/issues/508) to run the code, this alpa issue (https://github.com/alpa-projects/alpa/issues/521) launches 7 ray workers on 2 gpu nodes.
Issue Severity
Medium: It is a significant difficulty but I can work around it.