Open kostrykin opened 1 year ago
Are you using conda Ray package?
Are you using conda Ray package?
@jjyao Yes, from conda-forge.
Trying to reproduce the issue but not successful yet. Would you mind helping by running the repro script again, and post output of these commands?
pip freeze | grep grpcio
and
cat /tmp/ray/session_latest/logs/dashboard_agent*
My env:
Macbook M1
python==3.10.12 (conda)
numpy==1.25.0 (conda)
scipy==1.11.1 (pip, conda version is missing liblapack.3.dylib)
ray==2.5.1 (pip, conda does not have macos package)
and ran the repro script without a problem.
Trying to reproduce the issue but not successful yet. Would you mind helping by running the repro script again, and post output of these commands?
pip freeze | grep grpcio
grpcio @ file:///home/conda/feedstock_root/build_artifacts/grpc-split_1675287624183/work
and
cat /tmp/ray/session_latest/logs/dashboard_agent*
2023-07-19 10:46:39,997 INFO agent.py:117 -- Parent pid is 2999273
2023-07-19 10:46:39,998 INFO agent.py:143 -- Dashboard agent grpc address: 0.0.0.0:59997
2023-07-19 10:46:39,999 INFO utils.py:112 -- Get all modules by type: DashboardAgentModule
2023-07-19 10:46:40,000 INFO utils.py:123 -- Module ray.dashboard.modules.actor.actor_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,051 INFO utils.py:123 -- Module ray.dashboard.modules.event.event_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,052 INFO utils.py:123 -- Module ray.dashboard.modules.healthz.healthz_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,052 INFO utils.py:123 -- Module ray.dashboard.modules.healthz.healthz_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,052 INFO utils.py:123 -- Module ray.dashboard.modules.job.cli cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'pydantic'
2023-07-19 10:46:40,053 INFO utils.py:123 -- Module ray.dashboard.modules.job.job_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,053 INFO utils.py:123 -- Module ray.dashboard.modules.job.job_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,054 INFO utils.py:123 -- Module ray.dashboard.modules.job.job_manager cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'pydantic'
2023-07-19 10:46:40,054 INFO utils.py:123 -- Module ray.dashboard.modules.job.pydantic_models cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'pydantic'
2023-07-19 10:46:40,055 INFO utils.py:123 -- Module ray.dashboard.modules.log.log_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,055 INFO utils.py:123 -- Module ray.dashboard.modules.log.log_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,067 INFO utils.py:123 -- Module ray.dashboard.modules.log.log_manager cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'pydantic'
2023-07-19 10:46:40,070 INFO utils.py:123 -- Module ray.dashboard.modules.metrics.metrics_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,071 INFO utils.py:123 -- Module ray.dashboard.modules.node.node_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,071 INFO utils.py:123 -- Module ray.dashboard.modules.reporter.reporter_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'opencensus'
2023-07-19 10:46:40,072 INFO utils.py:123 -- Module ray.dashboard.modules.reporter.reporter_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,073 INFO utils.py:123 -- Module ray.dashboard.modules.serve.serve_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,073 INFO utils.py:123 -- Module ray.dashboard.modules.serve.serve_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,074 INFO utils.py:123 -- Module ray.dashboard.modules.snapshot.snapshot_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,074 INFO utils.py:123 -- Module ray.dashboard.modules.state.state_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,074 INFO utils.py:123 -- Module ray.dashboard.modules.test.test_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,075 INFO utils.py:123 -- Module ray.dashboard.modules.test.test_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'aiohttp'
2023-07-19 10:46:40,075 INFO utils.py:123 -- Module ray.dashboard.modules.test.test_utils cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'async_timeout'
2023-07-19 10:46:40,076 INFO utils.py:145 -- Available modules: [<class 'ray.dashboard.modules.runtime_env.runtime_env_agent.RuntimeEnvAgent'>]
2023-07-19 10:46:40,076 INFO agent.py:172 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.runtime_env.runtime_env_agent.RuntimeEnvAgent'>
2023-07-19 10:46:40,076 INFO agent.py:177 -- Loaded 1 modules.
My env:
Macbook M1
python==3.10.12 (conda) numpy==1.25.0 (conda) scipy==1.11.1 (pip, conda version is missing liblapack.3.dylib) ray==2.5.1 (pip, conda does not have macos package)
and ran the repro script without a problem.
Maybe it's due to the OS? As I reported in the issue, I am using Ubuntu 20.04.6. Or due to scipy and ray being installed via pip instead of Conda?
Here is the full Conda environment, obtained by conda list
, including all dependencies of the packages listed in the issue and their dependencies:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
attrs 23.1.0 pyh71513ae_1 conda-forge
brotli-python 1.0.9 py310hd8f1fbe_9 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.19.1 hd590300_0 conda-forge
ca-certificates 2023.5.7 hbcca054_0 conda-forge
certifi 2023.5.7 pyhd8ed1ab_0 conda-forge
charset-normalizer 3.2.0 pyhd8ed1ab_0 conda-forge
click 8.1.6 unix_pyh707e725_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
filelock 3.12.2 pyhd8ed1ab_0 conda-forge
frozenlist 1.4.0 py310h2372a71_0 conda-forge
grpc-cpp 1.48.1 h4fad500_3 conda-forge
grpcio 1.48.1 py310h4a5735c_3 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
importlib_resources 6.0.0 pyhd8ed1ab_1 conda-forge
jsonschema 4.18.4 pyhd8ed1ab_0 conda-forge
jsonschema-specifications 2023.7.1 pyhd8ed1ab_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libabseil 20220623.0 cxx17_h05df665_6 conda-forge
libblas 3.9.0 17_linux64_openblas conda-forge
libcblas 3.9.0 17_linux64_openblas conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgfortran-ng 13.1.0 h69a702a_0 conda-forge
libgfortran5 13.1.0 h15d22d2_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
liblapack 3.9.0 17_linux64_openblas conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.23 pthreads_h80387f5_0 conda-forge
libprotobuf 3.21.12 h3eb15da_0 conda-forge
libsqlite 3.42.0 h2797004_0 conda-forge
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
msgpack-python 1.0.5 py310hdf3cbec_0 conda-forge
ncurses 6.4 hcb278e6_0 conda-forge
numpy 1.25.0 py310ha4c1d20_0 conda-forge
openssl 3.1.1 hd590300_1 conda-forge
packaging 23.1 pyhd8ed1ab_0 conda-forge
pip 23.2 pyhd8ed1ab_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_0 conda-forge
platformdirs 3.9.1 pyhd8ed1ab_0 conda-forge
pooch 1.7.0 pyha770c72_3 conda-forge
protobuf 4.21.12 py310heca2aa9_0 conda-forge
psutil 5.9.5 py310h1fa729e_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.12 hd12c33a_0_cpython conda-forge
python_abi 3.10 3_cp310 conda-forge
pyyaml 6.0 py310h5764c6d_5 conda-forge
ray-core 2.5.1 py310h2ca9b2b_0 conda-forge
re2 2023.02.01 hcb278e6_0 conda-forge
readline 8.2 h8228510_1 conda-forge
referencing 0.30.0 pyhd8ed1ab_0 conda-forge
requests 2.31.0 pyhd8ed1ab_0 conda-forge
rpds-py 0.9.2 py310hcb5633a_0 conda-forge
scipy 1.11.1 py310ha4c1d20_0 conda-forge
setproctitle 1.2.2 py310h5764c6d_2 conda-forge
setuptools 68.0.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
typing-extensions 4.7.1 hd8ed1ab_0 conda-forge
typing_extensions 4.7.1 pyha770c72_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
urllib3 2.0.3 pyhd8ed1ab_1 conda-forge
wheel 0.40.0 pyhd8ed1ab_1 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
zipp 3.16.2 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
im seeing this issue as well...does anyone know if a rollback to previous version solves this? i am using debian 12 (bookworm).
what's interesting is i have to containers that are exactly the same image, just different tags and name and both running on the same server. the first instance that i brought up works fine. the 2nd instance exhibits the issue.
UPDATE: upon testing this fruther - it seems it can happen in either container. these two containers are on the same server but serving as different environments, one represents qa and the other prod. if i kick the process off at the same time, then i'll see the broken pipes in both:
File "/apps/data/publish/publisher.py", line 23, in write_data
log_ref = ray.put(self.log) if async_write else None
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ray/_private/worker.py", line 2597, in put
object_ref = worker.put_object(value, owner_address=serialize_owner_address)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ray/_private/worker.py", line 704, in put_object
self.core_worker.put_serialized_object_and_increment_local_ref(
File "python/ray/_raylet.pyx", line 2939, in ray._raylet.CoreWorker.put_serialized_object_and_increment_local_ref
File "python/ray/_raylet.pyx", line 2831, in ray._raylet.CoreWorker._create_put_buffer
File "python/ray/_raylet.pyx", line 412, in ray._raylet.check_status
im was using latest ray 2.6.3. i rolled back to 2.6.1 and still see the issue.
feels like the two containers may be sharing some of the ray resources and interfering w/ each other...is there a way to start ray in each container such that it is independent and local to that container only? at the moment im simply using it as a multi-processing replacement for python multiprocessing library. seeing alot of these errors in som of the ray logs:
[2023-09-17 11:21:52,199 I 41 347] raylet_client.cc:364: Error reporting task backlog information: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
[2023-09-17 11:21:53,199 I 41 347] raylet_client.cc:364: Error reporting task backlog information: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
[2023-09-17 11:21:54,190 E 41 347] gcs_rpc_client.h:547: Failed to connect to GCS within 60 seconds. GCS may have been killed. It's either GCS is terminated by `ray stop` or is killed unexpectedly. If it is killed unexpectedly, see the log file gcs_server.out. https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure. The program will terminate.
What happened + What you expected to happen
ray.exceptions.RaySystemError: System error: Broken pipe
is raised when usingray.put
.The console output when running the reproduction script:
The reproduction script reported below is a minimal example, simplifying it any further (e.g., using
range(2)
instead ofrange(3)
in the code below) eliminates the error.I have also found two possible work-arounds:
python =3.9.16
andray-core =1.6.0
instead of the versions reported below.python =3.8.5
,numpy =1.20.3
,scipy =1.6.3
,ray-core =2.3.0
instead of the versions reported below.The computer which I used for testing was equipped with 32 GiB of RAM.
Versions / Dependencies
OS: Ubuntu 20.04.6
Dependencies:
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.