Open sberryman opened 6 years ago
@sberryman Thank you for pointing this out. You're right, using apt-get clean
in a separate layer is useless. I looked into this further, and it turns out the standard Ubuntu image automatically calls clean
and autoremove
after every install, as mentioned in the official documentation. So I removed my explicit call.
Your second suggestion is correct as well. But instead of calling clean
(which is being called automatically anyway), the best practice is to delete apt-get
cache using rm -rf /var/lib/apt/lists/*
. Adding this line in every RUN is ugly. And according to docker history
, the apt-get cache adds 40MB, which is about 1% of the total image size, so it's not worth reducing the readability and flexibility of the code.
I did merge related RUN blocks together, keeping a balance between optimization and readability. I'll keep this issue open for any future suggestions or to correct me if any of my conclusions are wrong. Thanks again for the tips.
Looking a lot better, here are few more things I noticed:
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp27-none-linux_x86_64.whl
would probably be a better idea. This will help ensure a consistent builddocker inspect
will show the variables. I've seen a mix of env and checksum in the layer being built or all placed near the top of the Docker file.Example 1: (haven't actually tested this, more pseudo-code)
RUN export OPENCV_VERSION=3.4.1 && \
export OPENCV_CHECKSUM=f1b87684d75496a1054405ae3ee0b6573acaf3dad39eaf4f1d66fdd7e03dc852 && \
curl --retry 7 --fail -vo /tmp/opencv.tar.gz "https://codeload.github.com/opencv/opencv/tar.gz/${OPENCV_VERSION}" && \
echo "${OPENCV_CHECKSUM} /tmp/opencv.tar.gz" | sha256sum -c && \
tar -zxf /tmp/opencv.tar.gz -C /usr/local/src/opencv && \
rm /tmp/opencv.tar.gz && \
cd /usr/local/src/opencv && \
mkdir build && && \
cd build && \
cmake -D CMAKE_INSTALL_PREFIX=/usr/local \
-D BUILD_TESTS=OFF \
-D BUILD_PERF_TESTS=OFF \
-D PYTHON_DEFAULT_EXECUTABLE=$(which python3) \
.. && \
make -j"$(nproc)" && \
make install
One last note, not doing an apt-get update
and cleaning up on each layer you are installing leaves you open for inconsistent builds. Right now you are updating the sources in (basically) the first layer. So once you build the image locally each layer is cached. If you make a change at line 13, any successive layers will be rebuilt, however, they will not be using an updated sources as that is from the cached layer. I haven't come across this being a problem in any production deployments I have but something to be cognizant of.
https://github.com/waleedka/modern-deep-learning-docker/blob/36ae632f5b90af34458e196aea52406799139b93/Dockerfile#L133-L134
While it would be great if this reduced the image size, it has zero effect as it is on a different layer. The only way this would be of use is if it is combined with a
RUN
command where you are actually installing or updating the repository.In reality you would need to combine all apt-get update and install commands in a single
RUN
layer where you then clean up at the end of that single command. Considering this is for research and testing it isn't a big deal. Just figured I would point it out.