opea-project / GenAIComps

GenAI components at micro-service level; GenAI service composer to create mega-service
Apache License 2.0
24 stars 58 forks source link

Images built from Dockerfiles are 2x larger than they need/should to be #111

Closed eero-t closed 2 weeks ago

eero-t commented 1 month ago

Problem

After building images using Dockerfiles in this repository, according to instructions here: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker-composer/xeon/README.md

And investigating their huge download & image sizes, I found out that most of that is due to unwanted Nvidia libraries, pulled in by sentence-transformers => torch Python module dependencies.

To make matters worse, those deps are specified in Dockerfile so that container manager cannot de-duplicate layers on which that content is.

Solution

Better would be to build such images in multiple steps. First building intermediate image(s) that contains common components like sentence-transformers and its huge deps, and then using that as base for all the images that need those components.

That way Docker and image registries would need to transfer & store that huge (in my case unused) content on disk only once.

Longer term solution

In longer term, I think upstream Torch project should be encouraged to decouple drivers from its downloads to separate modules and document how their users can install what they actually need (in my case, I would prefer OneAPI Level-Zero Intel GPU drivers).

eero-t commented 1 month ago

Huge images:

$ docker images | grep opea
opea/llm-tgi            latest     139c71c9b631   About an hour ago   2.14GB
opea/reranking-tei      latest     25343403b23b   2 hours ago         6.36GB
opea/retriever-redis    latest     791036607726   3 hours ago         7.55GB
opea/embedding-tei      latest     2b93d18d6f56   3 hours ago         7.27GB

Due to (useless to me) Nvidia support :

$ docker run -it --rm --entrypoint /bin/sh --user 1000 opea/retriever-redis -c "du -ks /home/user/.local/lib/python3.11/site-packages/* | sort -nr"
2892656  /home/user/.local/lib/python3.11/site-packages/nvidia
1618908  /home/user/.local/lib/python3.11/site-packages/torch
429964   /home/user/.local/lib/python3.11/site-packages/triton
107692   /home/user/.local/lib/python3.11/site-packages/scipy
92780    /home/user/.local/lib/python3.11/site-packages/transformers
...

In multiple Python modules:

$ find /home/ -name '*nvidia*' -o -name '*cuda*' | xargs du -ks | sort -nr
2892660  /home/user/.local/lib/python3.11/site-packages/nvidia
861020   /home/user/.local/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
83440    /home/user/.local/lib/python3.11/site-packages/torch/lib/libtorch_cuda_linalg.so
80160    /home/user/.local/lib/python3.11/site-packages/triton/third_party/cuda
...

(triton is torch dependency, so one gets single instance of also that with what I proposed.)

eero-t commented 1 month ago

Following change gets rid of the nvidia site-package (which still keeping torch) and reduces image size by 3GB:

 $ git diff
diff --git a/comps/retrievers/langchain/docker/Dockerfile b/comps/retrievers/langchain/docker/Dockerfile
index 2db4bfc..425d3e3 100644
--- a/comps/retrievers/langchain/docker/Dockerfile
+++ b/comps/retrievers/langchain/docker/Dockerfile
@@ -30,7 +30,12 @@ RUN chmod +x /home/user/comps/retrievers/langchain/run.sh
 USER user

 RUN pip install --no-cache-dir --upgrade pip && \
-    pip install --no-cache-dir -r /home/user/comps/retrievers/requirements.txt
+    pip install --no-cache-dir -r /home/user/comps/retrievers/requirements.txt && \
+    pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 \
+      nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 \
+      nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-curand-cu12 \
+      nvidia-cusolver-cu12 nvidia-cusparse-cu12 \
+      nvidia-nccl-cu12 nvidia-nvjitlink-cu12 nvidia-nvtx-cu12

 ENV PYTHONPATH=$PYTHONPATH:/home/user

However, then retriever pod import torch fails to:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.11/site-packages/torch/__init__.py", line 176, in _load_global_deps
Parsing 10k filing doc for NIKE data/nke-10k-2023.pdf
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/local/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.12: cannot open shared object file: No such file or directory

I.e. special torch + triton versions that are built without CUDA support are needed.

Until then, using shared base image with these deps will at least make sure that there's only single instance of these huge libs.

yinghu5 commented 2 weeks ago

thank you for raising the question. we will feedback to the dev team and update if any progress.

eero-t commented 2 weeks ago

As an example, here's documentation on how Kubeflow images are layered: https://www.kubeflow.org/docs/components/notebooks/container-images/

(Once this ticket has been fixed, similar graph of OPEA image layers would nice.)

chensuyue commented 2 weeks ago

I think this PR could solve your issue, https://github.com/opea-project/GenAIComps/pull/221

chensuyue commented 2 weeks ago

Feel free to reopen it if you have other concerns.

eero-t commented 1 week ago

I think this PR could solve your issue, #221

Feel free to reopen it if you have other concerns.

Only project members can re-open tickets. I cannot.

221 solves only half of the container size problem I reported (while getting rid components with Nvidia proprietary license, which IMHO was at least as important).

=> @chensuyue Please reopen.

Because several containers are using PyTorch, container size can be reduced further (significantly), by putting PyTorch (and other shared dependencies) to an intermediate image that is used as a (shared) base image for the final images needing those deps.

That way:

I.e. it will take significantly less bandwidth, disk space and time, which all are cost reductions on rented clusters.

Dockerfiles will also be more maintainable, with (almost) duplicated content being removed.

eero-t commented 1 week ago

Filed https://github.com/opea-project/GenAIComps/issues/265 for rest of the issue.