ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.71k stars 5.73k forks source link

[Core] Dashboard fails to load with immutable Python install #42387

Open pveierland opened 9 months ago

pveierland commented 9 months ago

What happened + What you expected to happen

When using the ray python package via Nix (using dream2nix), where the package is located on an immutable file system (/nix/store), the dashboard crashes during initialization when it attempts to create directories within a non-writable directory.

This appears to happen because shutil.copytree is used to copy GRAFANA_CONFIG_INPUT_PATH which is a non-writable directory to a new location, and then os.makedirs is called to create directories within this non-writable directory.

Output from session/logs/dashboard.err:

/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/modules/reporter/reporter_agent.py:56: UserWarning: `gpustat` package is not installed. GPU monitoring is not available. To have full functionality of the dashboard please install `pip install ray[default]`.)
  warnings.warn(
Traceback (most recent call last):
  File "/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/dashboard.py", line 260, in <module>
    raise e
  File "/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/dashboard.py", line 248, in <module>
    loop.run_until_complete(dashboard.run())
  File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/dashboard.py", line 75, in run
    await self.dashboard_head.run()
  File "/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/head.py", line 370, in run
    await asyncio.gather(*concurrent_tasks, *(m.run(self.server) for m in modules))
  File "/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/modules/metrics/metrics_head.py", line 331, in run
    self._create_default_grafana_configs()
  File "/nix/store/4v1swvkw0kashhc35iqffiqrfzw4s36d-python3.11-ray-2.9.0/lib/python3.11/site-packages/ray/dashboard/modules/metrics/metrics_head.py", line 209, in _create_default_grafana_configs
    os.makedirs(
  File "<frozen os>", line 215, in makedirs
  File "<frozen os>", line 225, in makedirs
PermissionError: [Errno 13] Permission denied: '/tmp/ray/session_2024-01-13_23-28-57_095715_642924/metrics/grafana/provisioning'

The expected behavior would be that permissions for copies made from the package source are explicitly set such that Ray is compatible with immutable package management systems.

Versions / Dependencies

Python 3.11.6 (main, Oct  2 2023, 13:45:54) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ray
>>> ray.__version__
'2.9.0'

Reproduction script

ray start --head --include-dashboard=true

Issue Severity

Medium: It is a significant difficulty but I can work around it.

rkooo567 commented 9 months ago

There's one theory in our mind. Is it possible for you to experiment our fix with your environment if I suggest it?

pveierland commented 9 months ago

Sure! Depends a bit on the fix given the constraints of the setup, but I'll try.

rkooo567 commented 8 months ago

so in codebase metrics_head.py

shutil.copytree(GRAFANA_CONFIG_INPUT_PATH, grafana_config_output_path)

you have this code. Can you change it to

shutil.copytree(GRAFANA_CONFIG_INPUT_PATH, grafana_config_output_path, copy_function=copy)

? (the default is copy2 which I believe requires more extensive permission).