microsoft / ray-on-aml

Turning AML compute into Ray cluster
Other
77 stars 13 forks source link

Ray and python version mismatch #32

Open ciroaceto opened 1 year ago

ciroaceto commented 1 year ago

Using the following code,

from ray_on_aml.core import Ray_On_AML

ray_on_aml = Ray_On_AML(ml_client=ml_client, compute_cluster='rl-agents-cluster')

ray = ray_on_aml.getRay(ci_is_head=True,
    num_node=2,
    pip_packages=[
        "tensorflow==2.11"
    ]
)

I am getting this error:

RuntimeError: Version mismatch: The cluster was started with:
    Ray: 2.2.0
    Python: 3.8.10
This process on node 10.2.1.4 was started with:
    Ray: 2.3.0
    Python: 3.8.5

I checked the source code and the python version is assigned from platform.python_version(). Why is it possible that the python version does not match?

On the other hand, ray version, the ray version of the cluster should be the same of the head node, if it is not specified, shouldn't it?

Any thoughts on why this is happening?

Thanks in advance.

james-tn commented 1 year ago

When you specify your own pip_packages, you're supposed to provide ray dependencies as well. Pls add ray dependencies to pip_packages

ciroaceto commented 1 year ago

Even adding "ray[default]==2.3" and "python==3.8.5" to pip_packages the same error comes up.

james-tn commented 1 year ago

@hyssh @chnldw to check

james-tn commented 1 year ago

No need for python. Python is not in pip. Are you sure you did something like this?

pip_packages=["ray[air]==2.3.0","ray[data]==2.3.0"])

ciroaceto commented 1 year ago

Completely sure. This is the code I'm using:

ray = ray_on_aml.getRay(ci_is_head=True,
    num_node=2,
    pip_packages=[
        "ray[rllib]==2.3",
        "ray[air]==2.3",
        "tensorflow==2.11"
    ]
)

Same error of version mismatch, both in ray and python. The kernel used is Python 3.8 - Jupyter. I suppose it isn't really important, but for you to have more information.

ciroaceto commented 1 year ago

Now the ray version is the same, but the python version is different: 3.8.5 and 3.8.10

jm-nab commented 3 months ago

Not exactly related, but this was the top search result for this error.

I ran into this error, when the ray-head and ray-worker versions mismatched. I shelled into the ray head and ran python --version and sure enough, the version on the ray head was different from what was on the workers.

I fixed it by providing the image and repository information to the spec.