Closed eero-t closed 2 weeks ago
Huge images:
$ docker images | grep opea
opea/llm-tgi latest 139c71c9b631 About an hour ago 2.14GB
opea/reranking-tei latest 25343403b23b 2 hours ago 6.36GB
opea/retriever-redis latest 791036607726 3 hours ago 7.55GB
opea/embedding-tei latest 2b93d18d6f56 3 hours ago 7.27GB
Due to (useless to me) Nvidia support :
$ docker run -it --rm --entrypoint /bin/sh --user 1000 opea/retriever-redis -c "du -ks /home/user/.local/lib/python3.11/site-packages/* | sort -nr"
2892656 /home/user/.local/lib/python3.11/site-packages/nvidia
1618908 /home/user/.local/lib/python3.11/site-packages/torch
429964 /home/user/.local/lib/python3.11/site-packages/triton
107692 /home/user/.local/lib/python3.11/site-packages/scipy
92780 /home/user/.local/lib/python3.11/site-packages/transformers
...
In multiple Python modules:
$ find /home/ -name '*nvidia*' -o -name '*cuda*' | xargs du -ks | sort -nr
2892660 /home/user/.local/lib/python3.11/site-packages/nvidia
861020 /home/user/.local/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
83440 /home/user/.local/lib/python3.11/site-packages/torch/lib/libtorch_cuda_linalg.so
80160 /home/user/.local/lib/python3.11/site-packages/triton/third_party/cuda
...
(triton
is torch
dependency, so one gets single instance of also that with what I proposed.)
Following change gets rid of the nvidia
site-package (which still keeping torch
) and reduces image size by 3GB:
$ git diff
diff --git a/comps/retrievers/langchain/docker/Dockerfile b/comps/retrievers/langchain/docker/Dockerfile
index 2db4bfc..425d3e3 100644
--- a/comps/retrievers/langchain/docker/Dockerfile
+++ b/comps/retrievers/langchain/docker/Dockerfile
@@ -30,7 +30,12 @@ RUN chmod +x /home/user/comps/retrievers/langchain/run.sh
USER user
RUN pip install --no-cache-dir --upgrade pip && \
- pip install --no-cache-dir -r /home/user/comps/retrievers/requirements.txt
+ pip install --no-cache-dir -r /home/user/comps/retrievers/requirements.txt && \
+ pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 \
+ nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 \
+ nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-curand-cu12 \
+ nvidia-cusolver-cu12 nvidia-cusparse-cu12 \
+ nvidia-nccl-cu12 nvidia-nvjitlink-cu12 nvidia-nvtx-cu12
ENV PYTHONPATH=$PYTHONPATH:/home/user
However, then retriever pod import torch
fails to:
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/torch/__init__.py", line 176, in _load_global_deps
Parsing 10k filing doc for NIKE data/nke-10k-2023.pdf
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/local/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.12: cannot open shared object file: No such file or directory
I.e. special torch
+ triton
versions that are built without CUDA support are needed.
Until then, using shared base image with these deps will at least make sure that there's only single instance of these huge libs.
thank you for raising the question. we will feedback to the dev team and update if any progress.
As an example, here's documentation on how Kubeflow images are layered: https://www.kubeflow.org/docs/components/notebooks/container-images/
(Once this ticket has been fixed, similar graph of OPEA image layers would nice.)
I think this PR could solve your issue, https://github.com/opea-project/GenAIComps/pull/221
Feel free to reopen it if you have other concerns.
I think this PR could solve your issue, #221
Feel free to reopen it if you have other concerns.
Only project members can re-open tickets. I cannot.
=> @chensuyue Please reopen.
Because several containers are using PyTorch, container size can be reduced further (significantly), by putting PyTorch (and other shared dependencies) to an intermediate image that is used as a (shared) base image for the final images needing those deps.
That way:
I.e. it will take significantly less bandwidth, disk space and time, which all are cost reductions on rented clusters.
Dockerfiles will also be more maintainable, with (almost) duplicated content being removed.
Filed https://github.com/opea-project/GenAIComps/issues/265 for rest of the issue.
Problem
After building images using Dockerfiles in this repository, according to instructions here: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker-composer/xeon/README.md
And investigating their huge download & image sizes, I found out that most of that is due to unwanted Nvidia libraries, pulled in by
sentence-transformers
=>torch
Python module dependencies.To make matters worse, those deps are specified in Dockerfile so that container manager cannot de-duplicate layers on which that content is.
Solution
Better would be to build such images in multiple steps. First building intermediate image(s) that contains common components like
sentence-transformers
and its huge deps, and then using that as base for all the images that need those components.That way Docker and image registries would need to transfer & store that huge (in my case unused) content on disk only once.
Longer term solution
In longer term, I think upstream Torch project should be encouraged to decouple drivers from its downloads to separate modules and document how their users can install what they actually need (in my case, I would prefer OneAPI Level-Zero Intel GPU drivers).