ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.95k stars 5.77k forks source link

[Runtime] Improve runtime environment error message when virtualenv version is too old #28232

Open robertnishihara opened 2 years ago

robertnishihara commented 2 years ago

What happened + What you expected to happen

On my laptop (Macbook), I ran

ray start --head

Then in Python

import ray
ray.init("ray://localhost:10001", runtime_env = {"pip": ["sklearn"]})

This failed with

---------------------------------------------------------------------------
ConnectionAbortedError                    Traceback (most recent call last)
<ipython-input-2-796797b6bb49> in <module>
----> 1 ray.init("ray://localhost:10001", runtime_env = {"pip": ["sklearn"]})

~/opt/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
    103             if func.__name__ != "init" or is_client_mode_enabled_by_default:
    104                 return getattr(ray, func.__name__)(*args, **kwargs)
--> 105         return func(*args, **kwargs)
    106 
    107     return wrapper

~/opt/anaconda3/lib/python3.7/site-packages/ray/_private/worker.py in init(address, num_cpus, num_gpus, resources, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, log_to_driver, namespace, runtime_env, storage, **kwargs)
   1246         passed_kwargs.update(kwargs)
   1247         builder._init_args(**passed_kwargs)
-> 1248         ctx = builder.connect()
   1249         from ray._private.usage import usage_lib
   1250 

~/opt/anaconda3/lib/python3.7/site-packages/ray/client_builder.py in connect(self)
    181             _credentials=self._credentials,
    182             ray_init_kwargs=self._remote_init_kwargs,
--> 183             metadata=self._metadata,
    184         )
    185         get_dashboard_url = ray.remote(ray._private.worker.get_dashboard_url)

~/opt/anaconda3/lib/python3.7/site-packages/ray/util/client_connect.py in connect(conn_str, secure, metadata, connection_retries, job_config, namespace, ignore_version, _credentials, ray_init_kwargs)
     54         ignore_version=ignore_version,
     55         _credentials=_credentials,
---> 56         ray_init_kwargs=ray_init_kwargs,
     57     )
     58     return conn

~/opt/anaconda3/lib/python3.7/site-packages/ray/util/client/__init__.py in connect(self, *args, **kw_args)
    250     def connect(self, *args, **kw_args):
    251         self.get_context()._inside_client_test = self._inside_client_test
--> 252         conn = self.get_context().connect(*args, **kw_args)
    253         global _lock, _all_contexts
    254         with _lock:

~/opt/anaconda3/lib/python3.7/site-packages/ray/util/client/__init__.py in connect(self, conn_str, job_config, secure, metadata, connection_retries, namespace, ignore_version, _credentials, ray_init_kwargs)
    100             )
    101             self.api.worker = self.client_worker
--> 102             self.client_worker._server_init(job_config, ray_init_kwargs)
    103             conn_info = self.client_worker.connection_info()
    104             self._check_versions(conn_info, ignore_version)

~/opt/anaconda3/lib/python3.7/site-packages/ray/util/client/worker.py in _server_init(self, job_config, ray_init_kwargs)
    837             if not response.ok:
    838                 raise ConnectionAbortedError(
--> 839                     f"Initialization failure from server:\n{response.msg}"
    840                 )
    841 

ConnectionAbortedError: Initialization failure from server:
Traceback (most recent call last):
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/util/client/server/proxier.py", line 679, in Datapath
    client_id, job_config
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/util/client/server/proxier.py", line 307, in start_specific_server
    specific_server=specific_server,
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/util/client/server/proxier.py", line 254, in _create_runtime_env
    "Failed to create runtime_env for Ray client "
RuntimeError: Failed to create runtime_env for Ray client server, it is caused by:
Traceback (most recent call last):
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py", line 342, in _create_runtime_env_with_retry
    runtime_env_setup_task, timeout=setup_timeout_seconds
  File "/Users/rkn/opt/anaconda3/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
    return fut.result()
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py", line 297, in _setup_runtime_env
    runtime_env, plugin, uri_cache, context, per_job_logger
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/_private/runtime_env/plugin.py", line 251, in create_for_plugin_if_needed
    size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/_private/runtime_env/pip.py", line 453, in create
    return await task
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/_private/runtime_env/pip.py", line 438, in _create_for_hash
    logger,
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/_private/runtime_env/pip.py", line 329, in _run
    await self._create_or_get_virtualenv(path, exec_cwd, logger)
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/_private/runtime_env/pip.py", line 271, in _create_or_get_virtualenv
    await check_output_cmd(create_venv_cmd, logger=logger, cwd=cwd, env=env)
  File "/Users/rkn/opt/anaconda3/lib/python3.7/site-packages/ray/_private/runtime_env/utils.py", line 102, in check_output_cmd
    proc.returncode, cmd, output=stdout, cmd_index=cmd_index
ray._private.runtime_env.utils.SubprocessCalledProcessError: Run cmd[6] failed with the following details.
Command '['/Users/rkn/opt/anaconda3/bin/python', '-m', 'virtualenv', '--app-data', '/tmp/ray/session_2022-08-31_22-22-56_596214_5880/runtime_resources/pip/f929bc20de0fd1f008f9712e04168ad17c13174d/virtualenv_app_data', '--reset-app-data', '--no-periodic-update', '--system-site-packages', '--no-download', '/tmp/ray/session_2022-08-31_22-22-56_596214_5880/runtime_resources/pip/f929bc20de0fd1f008f9712e04168ad17c13174d/virtualenv']' returned non-zero exit status 2.
Last 50 lines of stdout:
    usage: virtualenv [--version] [--with-traceback] [-v | -q] [--app-data APP_DATA] [--clear-app-data] [--discovery {builtin}] [-p py] [--creator {builtin,cpython3-posix,venv}] [--seeder {app-data,pip}] [--no-seed]
                      [--activators comma_sep_list] [--clear] [--system-site-packages] [--symlinks | --copies] [--download | --no-download] [--extra-search-dir d [d ...]] [--pip version] [--setuptools version] [--wheel version] [--no-pip]
                      [--no-setuptools] [--no-wheel] [--symlink-app-data] [--prompt prompt] [-h]
                      dest
    virtualenv: error: unrecognized arguments: --reset-app-data --no-periodic-update

We should have a better error message with clear instructions saying that we need to upgrade virtualenv.

Versions / Dependencies

I was using

Upgrading to virtualenv 20.16.4 fixed the issue.

Running this on MacOS

Reproduction script

Above

Issue Severity

No response

architkulkarni commented 2 years ago

This should become much rarer after https://github.com/ray-project/ray/pull/27906, though a similar issue could still occur if any Ray python package dependency is out of date. One way to further improve the error message would be to check all installed package versions against Ray's requirements.txt at runtime.