Closed czgdp1807 closed 2 years ago
Let me take a look if this makes sense this Sun
It sounds like the warning about pending actors that cannot be scheduled is still printed in this case, @czgdp1807 can you post where it is printed and how it looks?
Not sure if the spilling should be invoked on a single node cluster, but if the warning is printed and there are no other negative side effects maybe it is ok and we can close this issue?
(scheduler +26s) Warning: The following resource request cannot be scheduled right now: {'CPU': 10.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
Starting actor 10
2021-12-21 19:11:52,494 WARNING worker.py:1257 -- It looks like you're creating a detached actor in an anonymous namespace. In order to access this actor in the future, you will need to explicitly connect to this namespace with ray.init(namespace="ec3e2076-7749-4dbd-bba6-cc9859ff32a3", ...)
@pcmoritz I get the above warning with the following code,
import time
import ray, sys
print(ray.__version__)
num_cpus_actor = 10
num_cpus_ray = 50
@ray.remote(num_cpus=num_cpus_actor) # tried different values
class Actor:
def __init__(self):
self.num = 0
if __name__ == '__main__':
cluster_mode = False
num_cpus = num_cpus_ray
try:
ray.init(num_cpus=num_cpus)
time.sleep(1)
for i in range(50):
print(f"Starting actor {i}")
Actor.options(name=str(i + 1), lifetime="detached").remote()
time.sleep(1)
time.sleep(10)
except Exception as e:
print("Exception {} ".format(str(e)))
finally:
ray.shutdown()
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
Search before asking
Ray Component
Ray Clusters
What happened + What you expected to happen
AFAICT, this issue is regarding spilling of tasks on a single cluster node. So, what happens is if there is only one node in a cluster then due to lack of resources an attempt is made to spill the remaining tasks on other nodes. This attempts will fail because there is only one node.
Therefore, instead of doing the above a warning should be shown to the end users that autoscaler is not running and some tasks are getting delayed/not getting scheduled due to lack of resources.
Following are some quotes from #20891 ,
https://github.com/ray-project/ray/issues/20891#issuecomment-996923485
https://github.com/ray-project/ray/issues/20891#issuecomment-997049442
Reference comment - https://github.com/ray-project/ray/issues/20891#issuecomment-996679034
cc: @pcmoritz @scv119 @stephanie-wang @rkooo567
Versions / Dependencies
Version - Master branch (Commit -
f5dfe6c1580443b3ce0a149b4e7f9d108cf43b6d
)Python - 3.8.11
OS - Windows 10 Pro (8 Virtual Processors and 16 GB RAM).
Reproduction script
Satisfy the following condition,
num_cpus_ray < num_actors * num_cpus_actor
Only one node should be used (Default setting of Ray though).Anything else
No response
Are you willing to submit a PR?