ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.62k stars 5.71k forks source link

[core] Exception: Failed to read dashboard.err file #34504

Open shreyanssethi opened 1 year ago

shreyanssethi commented 1 year ago

What happened + What you expected to happen

Trying to run the following code for ray start:

ray start --head \
    --port $RAY_PORT \
    --dashboard-port $((RAY_PORT + 1)) \
    --include-dashboard True \
    --object-store-memory 10000000000 \
    --num-cpus 0 --num-gpus 0 \
    --temp-dir ./temp_link

And I keep getting the error: Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: './temp_link/session_2023-04-17_18-41-09_060425_2337608/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr. and then: Exception: Failed to read dashboard.err file: cannot mmap an empty file

I have checked that the ./temp_link/session_2023-04-17_18-41-09_060425_2337608/logs/ directory does exist but there is no dashboard.log file. I have an issue in launching the Ray cluster even if I set 'include-dashboard' as False

I know that others experienced similar issues here (https://github.com/ray-project/ray/issues/26320) and I tried using the following fix:

pip install grpcio == 1.49.1
pip uninstall -y ray
pip install -U "ray[default]"

However, my issue continues to exist.

Versions / Dependencies

Using linux Python 3.10.6 ray 2.3.1 grpcio 1.49.1

Reproduction script

ray start --head \ --port $RAY_PORT \ --dashboard-port $((RAY_PORT + 1)) \ --include-dashboard True \ --object-store-memory 10000000000 \ --num-cpus 0 --num-gpus 0 \ --temp-dir ./temp

Issue Severity

High: It blocks me from completing my task.

rickyyx commented 1 year ago

I think it might be a path issue. I ran into issues with starting ray with your repro. Seems some parts of the ray wasn't handling relative path well (the Plasmastore)

Could you try using the abs path for the temp dir and see if that works for you while I work on a fix for this?

hgl2017 commented 1 year ago

I also has same issue. 2023-06-20 18:24:41,889 ERROR services.py:1207 -- Failed to start the dashboard 2023-06-20 18:24:41,896 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2023-06-20 18:24:41,899 ERROR services.py:1242 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: './scratch/leuven/330/vsc33053/ray_spill/session_2023-06-20_18-24-13_924769_17774/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr. 2023-06-20 18:24:41,901 ERROR services.py:1276 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues

arshiya031196 commented 6 months ago

Hi, I'm also facing the same issue. I'm using only one node and don't even need ray, only vLLM but internally it initializes a ray session and gets stuck indefinitely here:

2024-04-16 19:43:51,045 ERROR services.py:1330 -- Failed to start the dashboard 2024-04-16 19:43:51,045 ERROR services.py:1355 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2024-04-16 19:43:51,045 ERROR services.py:1365 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: '/tmp/ray/session_2024-04-16_19-43-09_468986_3298/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr. 2024-04-16 19:43:51,045 ERROR services.py:1399 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues 2024-04-16 19:43:53,550 INFO worker.py:1752 -- Started a local Ray instance.

Is there some way to disable ray in only vLLM scripts or mitigate this issue?

rickyyx commented 6 months ago

cc @anyscalesam

yangalan123 commented 4 months ago

It works for me that I just uninstall grpcio, ray and vllm and re-install latest version of vllm (==0.4.3, which automatically install ray==2.24.0). Hope that helps!