Open ianmaddox opened 3 weeks ago
Hi @ianmaddox are you using ray client? You are recommended to use Ray job submission (https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html). If you want to use ray client, make sure you provide the right address (https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html). 6379
is the GCS port not ray client port.
What happened + What you expected to happen
I'm having significant difficulty connecting to any ray head server. I've launched a server built from pip at version 2.37.0 and when that didn't work I launched another using the AWS "ray up" approach in the documentation which launched version 2.30.0. In both cases, I get the following error when trying to connect with a client:
I found that I can navigate to the path specified and create node_ip_address.json and put the IP of the head server in there to get past this error:
However after that is resolved I get stuck on the following error and can't get past it:
This message is printed repeatedly for 30 seconds with no way to cancel or abort unless you kill the process.
I've tried connecting with cleanly installed clients on multiple machines at both version 2.37.0 and 2.30.0 from Ubuntu boxes with Python 3.10.12.
I've confirmed that the dashboard on both head instances I've launched works fine. Port 6379 is open. The firewall does not restrict any traffic between the local node and head machine.
Versions / Dependencies
Ubuntu 20.22 and 20.24 Python 3.10.12 Ray head 2.30.0 and 2.37.0 Ray client node 2.30.0 and 2.37.0
Reproduction script
This is the test script I'm using:
Issue Severity
High: It blocks me from completing my task.