Closed NavinKumarMNK closed 5 months ago
In my case when container used the network bridge
it works fine and when it got connect through network host
it creates this specified problem.
The reason why this occurs in only ray-cluster started nodes, is because by default the containers use host network --net=host
is passed by ray cluster while creating a container
hello, I have a question, when I create container using bridge, the ip in the container is wrong. So the ray head can't get connected with ray worker.
Could you share the command how you start container? Thank you !
What happened + What you expected to happen
example.py
when i run the script, (same error without first two lines in the below terminal commands)
System Env:
! This script runs smoothly when i manually create a single container. But when it is created as a part
ray-cluster
launch, its not working (i logged in to the head node and if i run, actually there was only head node in my config) ray-cluster.yamlVersions / Dependencies
ray version : 2.6.3 python: 3.10.13 torch: 2.1.2 vllm : 0.3.3
os : ubuntu-22.04::ppc64le
Reproduction script
example.py
Running this script inside the ray-cluster started container. (works find without any error if i run manually by creating a container from the same image)
Issue Severity
High - Blocking My Project