Closed KyunghyunLee closed 2 years ago
This doesn’t mean the ports are selected. It means the ports between those numbers can be used by ray. The error message indicates the gcs server using 12345 can cause conflict with worker ports. Did you manually set 12345 for gcs-server-address?
The message has also improved at ray 1.10 (which will be released soon).
Yes, I manually set 12345 for the gcs-server-address. However, I used the 12345 port many times with the same command, and suddenly the error message occurs then cannot use it anymore. I didn't change the script. Sometimes it worked, and sometimes it didn't.
Sorry for the inconvenience.. It seems like we changed the default port range from 10003 - 11000 to 10003 - 19999 (I think it was due to some port related stability). So, after this version, 12345 will be a conflicting port from worker port range.
You can fix this by setting higher port value for gcs_server (2xxxx) or reducing the worker port range (--min-worker-port and --max-worker-port) https://docs.ray.io/en/master/configure.html#all-nodes
Search before asking
Ray Component
Ray Core
What happened + What you expected to happen
I am testing a new algorithm with multiple nodes, so I export many workers and kill them. After a huge amount of tests, I got a message when starting ray head,
seems like all worker_ports are assigned. Stopping ray with `--force' and even restarting the machine is not working. Is there any way to reset all those ports?
Versions / Dependencies
Ray 1.9.0 Python 3.6.10 Ubuntu 16.04
Reproduction script
ray start --head --port 12345 --redis-password 12345
Anything else
No response
Are you willing to submit a PR?