ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32k stars 5.45k forks source link

[Core]: `ray start` throws `ValueError: acceleratorType should match v(generation)-(cores/chips). Got .` #40001

Closed jmakov closed 9 months ago

jmakov commented 9 months ago

What happened + What you expected to happen

After fresh venv install, ray fails to start. After cluster launcher stopped working, starting ray manually on a local cluster was the only option. Now even this doesn't work. The same exception is thrown when running ray.init().

$ RAY_memory_monitor_refresh_ms=0 ray start --address='192.168.0.101:6379'
Local node IP: 192.168.0.108
Traceback (most recent call last):
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/scripts/scripts.py", line 2490, in main
    return cli()
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/autoscaler/_private/cli_logger.py", line 856, in wrapper
    return f(*args, **kwargs)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/scripts/scripts.py", line 920, in start
    node = ray._private.node.Node(
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/node.py", line 310, in __init__
    self.start_ray_processes()
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/node.py", line 1452, in start_ray_processes
    resource_spec = self.get_resource_spec()
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/node.py", line 540, in get_resource_spec
    self._resource_spec = ResourceSpec(
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/resource_spec.py", line 204, in resolve
    accelerator.update_resources_with_accelerator_type(resources)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/accelerator.py", line 39, in update_resources_with_accelerator_type
    accelerator_type=_autodetect_tpu_version(),
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/accelerator.py", line 214, in _autodetect_tpu_version
    return accelerator_type_to_version(accelerator_type_request.text)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/accelerator.py", line 197, in accelerator_type_to_version
    assert_tpu_accelerator_type(accelerator_type)
  File "/home/jernej_m/mambaforge-pypy3/envs/test_ray/lib/python3.9/site-packages/ray/_private/accelerator.py", line 239, in assert_tpu_accelerator_type
    raise ValueError(
ValueError: `acceleratorType` should match v(generation)-(cores/chips). Got .

Versions / Dependencies

Python 3.9.18 ray 2.7.0 OS: Manjaro 23.0.2

env.yaml:

name: test channels: - numba - conda-forge - defaults dependencies: - bokeh - cudatoolkit - datashader - holoviews - hvplot - ipywidgets - jupyter-resource-usage - jupyterlab - jupyterlab_execute_time - jupyterlab_widgets - jupyterlab-variableinspector - nodejs - numba - pip - pyarrow - pyyaml - python - python-dateutil - ruptures - setuptools - tqdm - pip: - detecta - numpy - polars[numpy,pandas,pyarrow,connectorx] - pycaret[full] - pandas[performance,parquet,feather] - ray[default]

Reproduction script

RAY_memory_monitor_refresh_ms=0 ray start --address='192.168.0.101:6379`

Issue Severity

High: It blocks me from completing my task.

rkooo567 commented 9 months ago

I believe it is the same issue as https://github.com/ray-project/ray/issues/39913. Can you try the master Ray and verify if it is the case? (https://docs.ray.io/en/master/ray-overview/installation.html#daily-releases-nightlies). We are planning to include the fix to 2.7.1 release which is planned on 10/9

jmakov commented 9 months ago

I believe it is the same issue as #39913. Can you try the master Ray and verify if it is the case? (https://docs.ray.io/en/master/ray-overview/installation.html#daily-releases-nightlies). We are planning to include the fix to 2.7.1 release which is planned on 10/9

This works for manually starting ray: pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp39-cp39-manylinux2014_x86_64.whl". Cluster launcher still doesn't work (workers uninitialized).

rkooo567 commented 9 months ago

Cluster launcher still doesn't work (workers uninitialized).

Does it raise the same error?

jmakov commented 9 months ago

Cluster launcher still doesn't work (workers uninitialized).

Does it raise the same error?

No, it starts the head node, but other nodes are left uninitialized.

architkulkarni commented 9 months ago

Duplicate of https://github.com/ray-project/ray/issues/39913