neonwatty / meme_search

Index your memes by their content and text, making them easily retrievable for your meme warfare pleasures. Find funny fast.
https://memesearch.co/
Apache License 2.0
288 stars 9 forks source link

Split Dockerfile in muliple stages #17

Closed StroescuTheo closed 1 month ago

StroescuTheo commented 2 months ago

The current Docker image size is approximately 9GB, which is quite large. To address this, I propose splitting the image into two stages:

Build stage: This stage will install all dependencies and run the Python requirements.
Runtime stage: This stage will only copy the necessary Python packages and binaries.

This approach has significantly reduced the Docker image size by approximately 4GB during my tests.

image

Also, are there any python packages that don't need to be installed if I'm running without a gpu? If there are, I would also recommend removing them from the original requirements and adding them in the Dockerfile, conditioned by an Environment variable. Something like:

RUN if [ "$ENABLE_GPU" = "true" ]; then \
        echo "torch" >> requirements.txt ; \
    fi 

Or even better, have 2 reqiurements.txt files (requirements-cpu.txt and requirements-gpu.txt) and copy the required one, based on a env variable.

neonwatty commented 1 month ago

Great idea! I get about the same ratio of size reduction with the staged build.

One update I'll make to the Dockerfile - move the env variable declearation

ENV PYTHONPATH=.

to the Runtime stage where its needed for internal reference by meme_search modules.

While torch is required for both cpu / gpu instances, there is opportunity to bifurcate the installs. e.g., when not using a GPU the torch install is smaller.

neonwatty commented 1 month ago

In my testing the env variable declaration

ENV PYTHONPATH=.

needs to occur in the runtime stage (at present its in build). Can you confirm and move?

Nevermind - I'll just merge and re-locate this one line.