ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.93k stars 5.77k forks source link

[<Ray component: autoscaler>] _load_kubernetes_defaults_config function is not yet made #37033

Open DrinkingMilktea opened 1 year ago

DrinkingMilktea commented 1 year ago

What happened + What you expected to happen

When I use

from ray.autoscaler import sdk
sdk.run_on_cluster(config, cmd=exec_cmd, with_output=True)

then it give me an error.

Traceback (most recent call last):
  File "/Users/user/PycharmProjects/ray-fastapi/lib/ray_manager.py", line 232, in stop_ray_exporter
    sdk.run_on_cluster(config, cmd=exec_cmd, with_output=True)
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/sdk/sdk.py", line 109, in run_on_cluster
    return commands.exec_cluster(
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/_private/commands.py", line 1065, in exec_cluster
    config = _bootstrap_config(config, no_config_cache=no_config_cache)
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/_private/commands.py", line 300, in _bootstrap_config
    config = prepare_config(config)
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/_private/util.py", line 237, in prepare_config
    with_defaults = fillout_defaults(config)
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/_private/util.py", line 262, in fillout_defaults
    defaults = _get_default_config(config["provider"])
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/_private/providers.py", line 255, in _get_default_config
    path_to_default = load_config()
  File "/Users/user/opt/anaconda3/envs/ray_fastapi/lib/python3.10/site-packages/ray/autoscaler/_private/providers.py", line 114, in _load_kubernetes_defaults_config
    import ray.autoscaler.kubernetes as ray_kubernetes
ModuleNotFoundError: No module named 'ray.autoscaler.kubernetes'
[ERROR/MainProcess] No module named 'ray.autoscaler.kubernetes'
def _load_kubernetes_defaults_config():
    import ray.autoscaler.kubernetes as ray_kubernetes

    return os.path.join(os.path.dirname(ray_kubernetes.__file__), "defaults.yaml")

above is the partial copy of ray/autoscaler/_private/prividers.py. I can't find ray.autoscaler.kubernetes folder in ray packages.

It seems that when the ray version is 1.9.x, it has 'ray/autoscaler/kubernetes/defaults.yaml' is existed, but ray version 2.5.1 has no defaults.yaml for private kubernetes. Could you check it and update it? Thanks for reading.

Versions / Dependencies

Ray : 2.5.1 python: 3.10.11 OS: Mac OS Ventura 13.2.1

pip list Package Version


absl-py 1.4.0 aiohttp 3.8.4 aiohttp-cors 0.7.0 aiorwlock 1.3.0 aiosignal 1.3.1 anyio 3.7.0 astunparse 1.6.3 async-timeout 4.0.2 attrs 23.1.0 backoff 2.2.1 blessed 1.20.0 cachetools 5.3.1 certifi 2023.5.7 charset-normalizer 3.1.0 click 8.1.3 cloudpickle 2.2.1 colorful 0.5.5 Deprecated 1.2.14 distlib 0.3.6 dm-tree 0.1.8 dnspython 2.3.0 exceptiongroup 1.1.1 fastapi 0.97.0 filelock 3.12.2 flatbuffers 23.5.26 frozenlist 1.3.3 fsspec 2023.6.0 gast 0.4.0 google-api-core 2.11.1 google-auth 2.20.0 google-auth-oauthlib 1.0.0 google-pasta 0.2.0 googleapis-common-protos 1.59.1 gpustat 1.1 grpcio 1.49.1 Gymnasium 0.26.3 gymnasium-notices 0.0.1 h11 0.14.0 h5py 3.9.0 httptools 0.5.0 idna 3.4 imageio 2.31.1 importlib-metadata 6.0.1 Jinja2 3.1.2 jsonschema 4.17.3 keras 2.13.1 kubernetes 26.1.0 lazy_loader 0.2 libclang 16.0.0 lz4 4.3.2 Markdown 3.4.3 markdown-it-py 3.0.0 MarkupSafe 2.1.3 mdurl 0.1.2 mpmath 1.3.0 msgpack 1.0.5 multidict 6.0.4 networkx 3.1 numpy 1.25.0 nvidia-ml-py 11.525.112 oauthlib 3.2.2 opencensus 0.11.2 opencensus-context 0.1.3 opentelemetry-api 1.18.0 opentelemetry-exporter-otlp 1.18.0 opentelemetry-exporter-otlp-proto-common 1.18.0 opentelemetry-exporter-otlp-proto-grpc 1.18.0 opentelemetry-exporter-otlp-proto-http 1.18.0 opentelemetry-proto 1.18.0 opentelemetry-sdk 1.18.0 opentelemetry-semantic-conventions 0.39b0 opt-einsum 3.3.0 packaging 23.1 pandas 2.0.2 Pillow 9.5.0 pip 23.1.2 platformdirs 3.6.0 prometheus-client 0.17.0 protobuf 4.23.3 psutil 5.9.5 py-spy 0.3.14 pyarrow 12.0.1 pyasn1 0.5.0 pyasn1-modules 0.3.0 pydantic 1.10.9 Pygments 2.15.1 pymongo 4.3.3 pyrsistent 0.19.3 python-dateutil 2.8.2 python-dotenv 1.0.0 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 ray 2.5.1 ray-cpp 2.5.1 requests 2.31.0 requests-oauthlib 1.3.1 rich 13.4.2 rsa 4.9 scikit-image 0.21.0 scipy 1.10.1 setuptools 67.8.0 six 1.16.0 smart-open 6.3.0 sniffio 1.3.0 starlette 0.27.0 sympy 1.12 tensorboard 2.13.0 tensorboard-data-server 0.7.1 tensorboardX 2.6.1 tensorflow 2.13.0rc2 tensorflow-estimator 2.13.0 tensorflow-macos 2.13.0rc2 termcolor 2.3.0 tifffile 2023.4.12 torch 2.0.1 typer 0.9.0 typing_extensions 4.5.0 tzdata 2023.3 urllib3 1.26.16 uvicorn 0.22.0 uvloop 0.17.0 virtualenv 20.21.0 watchfiles 0.19.0 wcwidth 0.2.6 websocket-client 1.6.0 websockets 11.0.3 Werkzeug 2.3.6 wheel 0.38.4 wrapt 1.15.0 yarl 1.9.2 zipp 3.15.0

Reproduction script

import os
import ray.autoscaler.local as local
default_yaml = os.path.join(os.path.dirname(local.__file__), "defaults.yaml")
from ray.autoscaler import sdk
log_cmd = "tail -n 100 /tmp/ray/session_latest/logs/monitor*"
monitor_output = sdk.run_on_cluster(default_yaml, cmd=log_cmd, with_output=True).decode()

print(monitor_output)

when I tested with ray/autoscaler/local/defaults.yaml which is added following string

provider:
  type: kubernetes

in any of root indentation.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

architkulkarni commented 1 year ago

Hi @DrinkingMilktea, thanks for reporting this. At a high level, what are you trying to do? run_on_cluster is marked as @DeveloperAPI (i.e. not a public API), so I wonder if there's another way of accomplishing what you're trying to do