ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.97k stars 5.77k forks source link

Failed to start the dashboard, return code 1 #32312

Open zyh3826 opened 1 year ago

zyh3826 commented 1 year ago

What happened + What you expected to happen

I use this command to start a head node, but got some errors:

ray start --head --port=6379 --include-dashboard=true --dashboard-host=0.0.0.0 --dashboard-port=8265

dashboard logs at /tmp/ray/session_latest/logs:

2023-02-08 17:18:11,333 INFO head.py:128 -- Dashboard head grpc address: 0.0.0.0:35598
2023-02-08 17:18:11,339 INFO head.py:232 -- Starting dashboard metrics server on port 44227
2023-02-08 17:18:11,342 INFO utils.py:112 -- Get all modules by type: DashboardHeadModule
2023-02-08 17:18:11,831 ERROR dashboard.py:230 -- The dashboard on node f58300b3b136 failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/dashboard.py", line 219, in <module>
    loop.run_until_complete(dashboard.run())
  File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/dashboard.py", line 66, in run
    await self.dashboard_head.run()
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/head.py", line 290, in run
    modules = self._load_modules(self._modules_to_load)
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/head.py", line 197, in _load_modules
    head_cls_list = dashboard_utils.get_all_modules(DashboardHeadModule)
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/utils.py", line 121, in get_all_modules
    importlib.import_module(name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in <module>
    class RayActivityResponse(BaseModel, extra=Extra.allow):
  File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.__new__
  File "/usr/lib/python3.8/abc.py", line 85, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
TypeError: __init_subclass__() takes no keyword arguments

Anyone can help me fixing this error? Thank you very much

Versions / Dependencies

ray version: 2.2.0(installed using pip install -U "ray[default]") system info: OS: Ubuntu 18.04.5 LTS bionic x86_64 Host: NF5468M5 00001 Kernel: 3.10.0-1160.11.1.el7.x86_64 Uptime: 123 days, 8 hours, 16 mins Packages: 377 Shell: bash 4.4.20 CPU: Intel Xeon Silver 4116 (48) @ 3.000GHz

Reproduction script

ray start --head --port=6379 --include-dashboard=true --dashboard-host=0.0.0.0 --dashboard-port=8265

Issue Severity

None

scottsun94 commented 1 year ago

@zyh3826 Have you run into this issue before? E.g., in old ray versions or on other machines?

Do you know which option/flag causes this issue? E.g, what if you only run ray start --head (by default, the dashboard should be started) ?

cc: @rkooo567

rkooo567 commented 1 year ago

What's your version of pydantic? pip freeze | grep pydantic

zyh3826 commented 1 year ago

@zyh3826 Have you run into this issue before? E.g., in old ray versions or on other machines?

Do you know which option/flag causes this issue? E.g, what if you only run ray start --head (by default, the dashboard should be started) ?

cc: @rkooo567

@scottsun94 Thank you for your reply, I use ray start --head, but got the same error, ray head is started, but the dashboard does not.

root@f58300b3b136:/# ray start --head
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.

Local node IP: 172.17.0.4
2023-02-22 09:38:33,666 ERROR services.py:1195 -- Failed to start the dashboard: Failed to start the dashboard, return code 1
 The last 10 lines of /tmp/ray/session_2023-02-22_09-38-31_027883_30348/logs/dashboard.log:
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in <module>
    class RayActivityResponse(BaseModel, extra=Extra.allow):
  File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.__new__
  File "/usr/lib/python3.8/abc.py", line 85, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
TypeError: __init_subclass__() takes no keyword arguments
2023-02-22 09:38:33,667 ERROR services.py:1196 -- Failed to start the dashboard, return code 1
 The last 10 lines of /tmp/ray/session_2023-02-22_09-38-31_027883_30348/logs/dashboard.log:
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in <module>
    class RayActivityResponse(BaseModel, extra=Extra.allow):
  File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.__new__
  File "/usr/lib/python3.8/abc.py", line 85, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
TypeError: __init_subclass__() takes no keyword arguments
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/ray/_private/services.py", line 1181, in start_api_server
    raise Exception(err_msg + last_log_str)
Exception: Failed to start the dashboard, return code 1
 The last 10 lines of /tmp/ray/session_2023-02-22_09-38-31_027883_30348/logs/dashboard.log:
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in <module>
    class RayActivityResponse(BaseModel, extra=Extra.allow):
  File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.__new__
  File "/usr/lib/python3.8/abc.py", line 85, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
TypeError: __init_subclass__() takes no keyword arguments
2023-02-22 09:38:33,678 WARNING services.py:1732 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67104768 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.

--------------------
Ray runtime started.
--------------------

Next steps
  To connect to this Ray runtime from another node, run
    ray start --address='172.17.0.4:6379'

  Alternatively, use the following Python code:
    import ray
    ray.init(address='auto')

  To connect to this Ray runtime from outside of the cluster, for example to
  connect to a remote cluster from your laptop directly, use the following
  Python code:
    import ray
    ray.init(address='ray://<head_node_ip_address>:10001')

  To see the status of the cluster, use
    ray status

  If connection fails, check your firewall settings and network configuration.

  To terminate the Ray runtime, run
    ray stop
zyh3826 commented 1 year ago

pip freeze | grep pydantic

@rkooo567 Thank you for your reply, my pydantic version is 1.6.1

root@f58300b3b136:/# pip freeze | grep pydantic
pydantic==1.6.1
rkooo567 commented 1 year ago

pydantic==1.9.1

Can you actually try this version and see how this works?

rkooo567 commented 1 year ago

Maybe the older pydantic is not well supported from ray

zyh3826 commented 1 year ago

pydantic==1.9.1

Can you actually try this version and see how this works?

This solve the problem, thank you

rkooo567 commented 1 year ago

cc @alanwguo can you lower bound the pydantic version? I will assign it to you for now, but feel free to unassign (we can find the owner later)