Open mforhad opened 8 months ago
I've came across the same error on Ray 2.10.0, the head version is 3.10, the worker version is 3.10. The patch version conflicts. I run the ray cluster on kubernetes.
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/scripts/scripts.py", line 954, in start
node.check_version_info()
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/node.py", line 396, in check_version_info
ray._private.utils.check_version_info(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/utils.py", line 1595, in check_version_info
raise RuntimeError(error_message)
RuntimeError: Version mismatch: The cluster was started with:
Ray: 2.10.0
Python: 3.10.13
This process on node 172.16.0.31 was started with:
Ray: 2.10.0
Python: 3.10.14
@architkulkarni Can you take a look and provide guidance?
@rynewang I remember you had some context about this error. Were we going to make 3 digit (micro) version mismatches only print a warning instead of raising an error?
As a workaround, can you specify the Ray docker image in the cluster YAML spec? E.g. https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/gcp/example-gpu-docker.yaml I think that should make it impossible to have two different python versions on different nodes.
As for the root cause, I'm not sure why the python version isn't constant when a GCP instance is launched with default settings.
What happened + What you expected to happen
I am having version dependencies as follows,
RuntimeError: Version mismatch: The cluster was started with: Ray: 2.9.3 Python: 3.10.13 This process on node xx.xxx.x.x was started with: Ray: 2.9.3 Python: 3.10.12
I have the following yaml file, `cluster_name: minimal
provider: type: gcp region: europe-west4 availability_zone: europe-west4-a project_id: project_name_xxx min_workers: 1 max_workers: 4 min_workers_python: "3.10.12" max_workers_python: "3.10.12" `
I am using google cloud to create this cluster. Even though, I am having this issue, the node was added to the cluster. But, I cannot submit a job to the cluster. I find the same issue while submitting the job.
I really appreciate your help.
Versions / Dependencies
RuntimeError: Version mismatch: The cluster was started with: Ray: 2.9.3 Python: 3.10.13 This process on node xx.xxx.x.x was started with: Ray: 2.9.3 Python: 3.10.12
Reproduction script
config.yaml
cluster_name: minimal
provider: type: gcp region: europe-west4 availability_zone: europe-west4-a project_id: project_id_xxx # Add your GCP project ID here min_workers: 1 max_workers: 4 min_workers_python: "3.10.12" # Specify the Python version for minimum workers max_workers_python: "3.10.12" # Specify the Python version for maximum workers
script.py
import ray
@ray.remote def hello_world(): return "hello world"
ray.init() print(ray.get(hello_world.remote()))
Issue Severity
High: It blocks me from completing my task.