Open Yue-Li-atBain opened 1 year ago
Hi @Yue-Li-atBain , do you have a Dockerfile that installs these dependencies in a way that reproduces the issue?
The docker file content is as follows. The environment.yaml contains the packages list above
FROM continuumio/miniconda3:4.10.3 AS main
RUN apt-get -y --allow-releaseinfo-change update && \
apt-get -y install build-essential && \
apt-get -y install dos2unix # required to execute ops\clean_files.sh on windows
RUN conda config --set ssl_verify false
RUN conda update -n base -c defaults conda
WORKDIR /opt/app
COPY --chown=1000:100 environment.yaml .
RUN conda env update -q --name base --file environment.yaml
RUN pip install flask-session==0.4.0
COPY src src
COPY setup.py .
RUN pip install -e .
Can you share environment.yaml
as well? Want to make sure we can reproduce exactly what you're seeing :). Also, how is Ray installed? I don't see it in the packages listed above or this dockerfile.
Lastly, what is pip install -e .
installing?
environment.yaml file is as follows:
channels:
- conda-forge
dependencies:
#core
- ray-core = 2.3.0
- tensorflow = 2.9
- tensorflow-probability
- pandas=1.3.4
- u8darts-all=0.19.0
- jupyterlab=3.2.4
- ipython=8.3.0
- numpy=1.22.3
- pillow=9.1.1
- ujson=5.4.0
- jinja2=3.1.2
- jupyter_server=1.17.1
- notebook=6.4.12
- hydra-core=1.2
- openpyxl=3.0.9
- mlflow=1.26.0
- libstdcxx-ng
- plotly=5.8.2
#dev
- pytest=6.2.5
- pytest-cov=3.0.0
- sphinx=4.3.0
- black=22.3.0
- pytest-helpers-namespace=2021.12.29
- pip
- pip:
- streamlit==1.11.1
- streamlit-aggrid==0.3.4.post3
- rsconnect-python==1.15.0
- statsforecast==0.7.1 # 1.0.0 is does not work with the current version of DARTS
- pytest-regtest==1.5.0
so sorry, I pasted the older version of environment.yaml file before. Now it's updated. I also updated the docker file. Before
pip install -e .
there is a very simple setup.py file and src folder being copied. The setup.py file is like this:
from setuptools import setup, find_packages
setup(
name="src",
version="0.0.0",
packages=["src"],
python_requires=">=3.9",
# install_requires=["peppercorn"], # Optional
)
were you able to reproduce?
Does the base environment as defined by the environment.yml
work (ray.init()
runs without failing)? I mean, if you use only this part of the Dockerfile, does it still segfault? Also: which version of python does this run with?
FROM continuumio/miniconda3:4.10.3 AS main
RUN apt-get -y --allow-releaseinfo-change update && \
apt-get -y install build-essential && \
apt-get -y install dos2unix # required to execute ops\clean_files.sh on windows
RUN conda config --set ssl_verify false
RUN conda update -n base -c defaults conda
WORKDIR /opt/app
COPY --chown=1000:100 environment.yaml .
RUN conda env update -q --name base --file environment.yaml
Yes, it still segfault.
Weird. When I use that Dockerfile and environment.yaml, the conda env update -q ...
line hangs for me and cannot get past solving the environment
...
Step 5/7 : WORKDIR /opt/app
---> Running in 1db0dfd64f96
Removing intermediate container 1db0dfd64f96
---> 50458d8acf24
Step 6/7 : COPY --chown=1000:100 environment.yaml .
---> c63175c89352
Step 7/7 : RUN conda env update -q --name base --file environment.yaml
---> Running in aa5841578ef8
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working...
Whew. It built. And for me ray.init()
works, although I do see a warning about /tmp being full. Perhaps running docker with a "-v" directive to map /tmp
to a filesystem outside the docker image would help?
$ docker build -f Dockerfile .
...
$ docker run -it --rm 07243a46c39a /bin/bash
(base) root@9aa20e3a2307:/opt/app# python
Python 3.9.5 (default, Jun 4 2021, 12:28:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ray
>>> ray.init()
2023-04-28 13:16:52,511 WARNING services.py:1780 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-04-28 13:16:52,627 INFO worker.py:1553 -- Started a local Ray instance.
RayContext(dashboard_url='', python_version='3.9.5', ray_version='2.3.0', ray_commit='{{RAY_COMMIT_SHA}}', address_info={'node_ip_address': '172.17.0.2', 'raylet_ip_address': '172.17.0.2', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2023-04-28_13-16-50_982369_283/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-04-28_13-16-50_982369_283/sockets/raylet', 'webui_url': '', 'session_dir': '/tmp/ray/session_2023-04-28_13-16-50_982369_283', 'metrics_export_port': 50146, 'gcs_address': '172.17.0.2:55455', 'address': '172.17.0.2:55455', 'dashboard_agent_listen_port': 52365, 'node_id': '1c205d542df5dd250303c70b186db6cf23aa24ccb4acc7a3a3b58829'})
>>> (raylet) [2023-04-28 13:17:02,527 E 407 425] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2023-04-28_13-16-50_982369_283 is over 95% full, available space: 9684275200; capacity: 502921633792. Object creation will fail if spilling is required.
and of course you can increase space by following the warning:
you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run...
What happened + What you expected to happen
I was trying to run some code with ray inside a docker image, but ray.init() throws std::bad_alloc error. The error remains even if I set object memory or _memory to less than 1GB.
Versions / Dependencies
The docker image was build with the following packages: first continuumio/miniconda3:4.10.3
channels:
core
dev
Reproduction script
The error occurs with the first call of ray.init()
Issue Severity
High: It blocks me from completing my task.