Open eero-t opened 5 months ago
"GenAIComps" repo Dockerfile
s have the same issue:
$ git grep -l -B1 -e mesa-glx -e '\bvim\b'
.github/workflows/docker/ut.dockerfile
comps/dataprep/qdrant/docker/Dockerfile
comps/dataprep/redis/docker/Dockerfile
comps/embeddings/langchain/docker/Dockerfile
comps/guardrails/langchain/docker/Dockerfile
comps/llms/summarization/tgi/Dockerfile
comps/llms/text-generation/tgi/Dockerfile
comps/reranks/langchain/docker/Dockerfile
comps/retrievers/langchain/docker/Dockerfile
Full Perl version gets added as git
package dependency (minimal Python image already included few MB minimal Perl, as that's POSIX requirement).
There are so many dependencies between "GenAIComps" and "GenAIExamples" repos that I think it would make sense to merge them. Then git
, and therefore also Perl, could be dropped from (almost) all images.
Another alternative would be having separate "fetch" phase in the Dockerfile which would install Git, do git pull
(using -depth 1
option to speed it), and remove .git
dir afterwards, so that its not left there when final Dockerfile phase copies the GenAIComps
dir content from "fetch" phase.
Dropping libgl1-mesa-glx
and replacing vim
with nano
in Dockerfile
, reduces chatqna
container size by 253MB i.e. 35%:
$ docker images|grep chatqna
opea/chatqna latest 4b71cbea8ab6 About an hour ago 727MB
opea/chatqna-test latest 9aadb869edaf 11 minutes ago 474MB
Another alternative would be having separate "fetch" phase in the Dockerfile which would install Git, do
git pull
(using-depth 1
option to speed it), and remove.git
dir afterwards, so that its not left there when final Dockerfile phase copies theGenAIComps
dir content from "fetch" phase.
Tried doing Git cloning in separate step and copying just repo content to final image:
FROM python:3.11-slim AS base
RUN useradd -m -s /bin/bash user && mkdir -p /home/user && chown -R user /home/user/
FROM base AS fetch
RUN apt-get install -y --no-install-recommends git
RUN cd /home/user/ && git clone --depth 1 https://github.com/opea-project/GenAIComps.git
RUN rm -r /home/user/GenAIComps/.git
FROM base AS final
COPY --from=fetch /home/user/GenAIComps /home/user/GenAIComps
...
=> It reduced final image size by additional 108MB, to 366MB, which is half of the original 727MB size.
Will validate for all the examples and then incorporate this.
All common dependencies should be on a shared base layer, see: https://github.com/opea-project/GenAIComps/issues/265
That way these optimizations need to be done only once.
will improve it in the future
Once the base images have been cleaned of extra content, it's easy to generate additional, separate "devel" images where those (Vim, Perl, Git etc) tools are added back.
All it needs is:
Which both are pretty trivial...
Another alternative would be having separate "fetch" phase in the Dockerfile which would install Git, do
git pull
(using-depth 1
option to speed it), and remove.git
dir afterwards, so that its not left there when final Dockerfile phase copies theGenAIComps
dir content from "fetch" phase.Tried doing Git cloning in separate step and copying just repo content to final image:
FROM python:3.11-slim AS base RUN useradd -m -s /bin/bash user && mkdir -p /home/user && chown -R user /home/user/ FROM base AS fetch RUN apt-get install -y --no-install-recommends git RUN cd /home/user/ && git clone --depth 1 https://github.com/opea-project/GenAIComps.git RUN rm -r /home/user/GenAIComps/.git FROM base AS final COPY --from=fetch /home/user/GenAIComps /home/user/GenAIComps ...
=> It reduced final image size by additional 108MB, to 366MB, which is half of the original 727MB size.
Instead of removing .git
dir in fetch
phase, final image could copy just needed pieces, for example:
ENV HOME=/home/user
COPY --from=fetch $HOME/GenAIComps/comps $HOME/GenAIComps/comps
COPY --from=fetch $HOME/GenAIComps/*.* $HOME/GenAIComps/
(git
could be better name for the intermediate container stage/image rather than fetch
.)
please submit pr
please submit pr
Ok, here's an example of doing that for GenAIExamples repo containers: https://github.com/opea-project/GenAIExamples/pull/1031
@kevinintel Do you want me to write example PR also for GenAIComps repo containers?
Many of the
Dockerfile
s install Vim and/or Mesa OpenGL/X packages:Why?
They take lot of space in the containers; Mesa's LLVM dependency alone adds >100MB, Vim adds 40MB, and I suspect they're reason why full Perl gets installed:
If containers really need text-editor, e.g.
nano
would be user-friendlier and much smaller (1MB) thanvim
.