ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.31k stars 5.64k forks source link

[Serve] Unable to upload current working directory #29354

Open ysmu opened 1 year ago

ysmu commented 1 year ago

What happened + What you expected to happen

I have a deployment file that specifies the . as the working directory. Following the documentation from

I was expecting serve deploy deployment.yml to upload the current directory to all workers in the cluster. However, I'm hitting some exceptions. Full repro is below. Is this a known limitation in ray serve?

Versions / Dependencies

Python 3.9.12 (main, Apr  5 2022, 06:56:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ray
>>> ray.__version__
'2.0.0'

Reproduction script

$ pip install ray[serve]
...

$ ray start --head
...

$ ls
app.py  deployment.yml

$ cat deployment.yml
import_path: app:app
runtime_env:
  working_dir: .

$ serve deploy deployment.yml
Traceback (most recent call last):
  File "/tmp/ray-bug/.venv/bin/serve", line 8, in <module>
    sys.exit(cli())
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/serve/scripts.py", line 176, in deploy
    ServeApplicationSchema.parse_obj(config)
  File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for ServeApplicationSchema
runtime_env
  Invalid protocol for runtime_env URI .. Supported protocols: ['GCS', 'CONDA', 'PIP', 'HTTPS', 'S3', 'GS', 'FILE']. Original error: '' is not a valid Protocol (type=value_error)

# Trying with file://.
$ sed -i 's#working_dir: .#working_dir: file://.#' deployment.yml

$ serve deploy deployment.yml
Traceback (most recent call last):
  File "/tmp/ray-bug/.venv/bin/serve", line 8, in <module>
    sys.exit(cli())
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/serve/scripts.py", line 177, in deploy
    ServeSubmissionClient(address).deploy_application(config)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/dashboard/modules/serve/sdk.py", line 52, in deploy_application
    self._raise_error(response)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 253, in _raise_error
    raise RuntimeError(
RuntimeError: Request failed with status code 500: Traceback (most recent call last):
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/dashboard/optional_utils.py", line 279, in decorator
    return await f(self, *args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/dashboard/modules/serve/serve_agent.py", line 125, in put_all_deployments
    client.deploy_app(config)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/serve/_private/client.py", line 32, in check
    return f(self, *args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/serve/_private/client.py", line 291, in deploy_app
    ray.get(self._controller.deploy_app.remote(config))
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/_private/worker.py", line 2275, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::ServeController.deploy_app() (pid=3770, ip=172.17.213.171, repr=<ray.serve.controller.ServeController object at 0x7f065c1b9310>)
  File "/tmp/ray-bug/.venv/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/tmp/ray-bug/.venv/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/serve/controller.py", line 444, in deploy_app
    self.config_deployment_request_ref = run_graph.options(
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/remote_function.py", line 209, in options
    updated_options["runtime_env"] = parse_runtime_env(
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/runtime_env/runtime_env.py", line 329, in __init__
    self[option] = option_val
  File "/tmp/ray-bug/.venv/lib/python3.9/site-packages/ray/runtime_env/runtime_env.py", line 356, in __setitem__
    res_value = OPTION_TO_VALIDATION_FN[key](jsonable_type)
ValueError: Only .zip files supported for remote URIs.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

simon-mo commented 1 year ago

Hi @ysmu, can you elaborate a bit on why is this needed? Are you uploading from local machine to the cluster? or from head node to the worker nodes?

ysmu commented 1 year ago

I was hoping this would upload the entire directory from the local/dev machine to the entire cluster. I can't use the git+zip method because my repo has submodules, which isn't included the zip file.

sihanwang41 commented 1 year ago

Hi @ysmu , that is known limitation and expected behavior. it is not allowed to use local working_dir in runtime env unless it is used in ray.init(). Internally the issue is also related to the https://github.com/ray-project/ray/issues/30666.

BartheG commented 1 year ago

Hi,

I'm facing the same issue (remote cluster + serve app in local) and I was wondering how a local directory can be sent to a cluster without using the git+zip method ?

Thanks,

chris-aeviator commented 1 year ago

does this mean that the functionality of ray serve (automatically uploading files mentioned in --working-dir --app-dir) is not available via multi-app deployments?

sercanCyberVision commented 6 months ago

Hello,

Anyone was able to workaround this?

I built my app as below:

serve build --app-dir "./" ray_serve_app:rent_predictor_app -o rent_predictor_app_config.yaml

Then I tried to deploy the app as below:

serve deploy --address "http://kuberay-head-svc.kuberay:8265" rent_predictor_app_config.yaml

As I am not able to pass working dir as "./", the module ray_serve_app is missing in the Ray cluster. What is the right way to handle this?

chris-aeviator commented 6 months ago

For me anything other than providing a zip AND using the FILE:// scheme failed.

Am 26.03.2024 um 01:31 schrieb Sercan Tekin @.***>:

 Hello,

Anyone was able to workaround this?

I built my app as below:

serve build --app-dir "./" ray_serve_app:rent_predictor_app -o rent_predictor_app_config.yaml Then I tried to deploy the app as below:

serve deploy --address "http://kuberay-head-svc.kuberay:8265" rent_predictor_app_config.yaml As I am not able to pass working dir as "./", the module ray_serve_app is missing in the Ray cluster. What is the right way to handle this?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

sercanCyberVision commented 6 months ago

For me anything other than providing a zip AND using the FILE:// scheme failed. Am 26.03.2024 um 01:31 schrieb Sercan Tekin @.***>:  Hello, Anyone was able to workaround this? I built my app as below: serve build --app-dir "./" ray_serve_app:rent_predictor_app -o rent_predictor_app_config.yaml Then I tried to deploy the app as below: serve deploy --address "http://kuberay-head-svc.kuberay:8265" rent_predictor_app_config.yaml As I am not able to pass working dir as "./", the module ray_serve_app is missing in the Ray cluster. What is the right way to handle this? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

Thank you @chris-aeviator.

After your suggestion, I tried the following and it worked.

I executed ray.init with working dir option between serve build and serve deploy. It failed as it is deprecated and not stable, but it pushed the files to GCS and printed URI anyways. pushed-files

I specified the mentioned URI in my config file, and now the deployment is not good. config-file