ugoviti / izpbx

izPBX is a Turnkey Cloud Native Telephony System powered by Asterisk Engine and FreePBX Management GUI
GNU General Public License v3.0
179 stars 74 forks source link

[FR] Use a multi-stage build to reduce image size #19

Closed eddyg closed 3 years ago

eddyg commented 3 years ago

Thanks for a great project!

Have you looked in to the feasibility of using a multi-stage Dockerfile so all the development/build dependencies don't need to be in the published container?

This is a great reference (somewhat geared toward Python, but still applicable) if it's something you're interested in looking into:

https://pythonspeed.com/articles/smaller-python-docker-images/ https://pythonspeed.com/articles/faster-multi-stage-builds/

ugoviti commented 3 years ago

Hi Eddyg,

True, in this project I don't use the multi stage build solution as described in the attached links. But the final container doesn't contain the development/build dependencies, because I remove all *-devel.rpm packages and the build cache in the build stage, so the final layer doesn't contain such code.

Did I misunderstand your suggestion? please can you explain better where you found the development/build dependencies? I'm interested.

Thank you for your feedback.

Kind Regards

ugoviti commented 3 years ago

I add a note:

  1. the dev-18 image contains the development files because is a dev package
  2. the 18.15.x image doesn't contain such files because is a production package

this behavior is managed via APP_DEBUG variable inside Dockerfile

# build stage
RUN set -xe && \
  . /etc/os-release && \
  ASTERISK_BUILD_DEPS=' \
    dmidecode \
    autoconf \

....

  if [[ ${APP_DEBUG} -eq 0 ]]; then \
  : "---------- Clean temporary files ----------" && \
  dnf remove -y $ASTERISK_BUILD_DEPS && \
  dnf clean all && \
  rm -rf /var/cache/yum /tmp /usr/src && \
  mkdir -p /usr/src /tmp && chmod 1777 /tmp ;fi && \
  : "---------- ALL builds finished ----------" 
eddyg commented 3 years ago

Yes. Docker image layers are "append only". I thought it was described quite well in the first link I gave:

A Docker image is built as a sequence of layers, with each layer building on top of the previous one. Among other things, layers can add files, or they can remove files.

Removing files doesn’t remove it from the previous layer, and all the layers are needed to unpack the Docker image.

  1. Layer A: Add lots of files for the compiler.
  2. Layer B(→A): Compile some code.
  3. Layer C(→B→A): Remove the files for the compiler

To download the image we need layers A, B, and C, so we have to download the compiler even though the relevant files aren’t accessible in the final image.

Using dive it's easy to see that all the stuff you remove is still present (and fully accessible!) in your image, it's just "hidden" in the image presented to the user by default. For example, here's a screenshot showing /usr/src (167MB, downloaded with every pull) in your image before it gets "hidden":

ScreenShot 2021-04-09-14 38 56 2@2x

In the top left corner, you can see the size of each layer that is required with each pull of your image. That's why multi-layer builds are important, and why I opened this FR to suggest utilizing them, as described in the links I gave (and in many other places around the 'net).

Hopefully that provides some context!

ugoviti commented 3 years ago

I'm sorry, but I don't agree with that.

of course removing files after the RUN command doesn't free the space at the layer level, this is a base Docker knowledge.

but if you look the code of izpbx Dockerfile the build stage remove all development files (only if build arg APP_DEBUG=0) before exiting the layer.

In your example, you are referring to the layer of freepbx sources, that are absolutely needed for the initial deploy, I can't remove that files if you want deploy izpbx for the first time.

You seen hidden stuff from the layers, what stuff? the /src/freepbx is added in the layer 42 and remain visible in the final layer, so where is the hidden stuff? I'm sorry, I can't see what you are telling about.

let's me explain better:

when releasing the production image, at the end of every "important" layer, I'll remove the development/build files (look the code).

can be used hub.docker.com to discover the layers size:

Production image: 18.15.8-284 515.72 MB (compressed)

analyze them together (only the larger layers, compressed sizes):

layer 1 = 71.7MB (centos8 base image size)
layer 34 = 330MB (os rpm packages and libraries in addition to the core image needed for "running" asterisk and freepbx)
layer 41 = 65.47MB (asterisk compiled binaries)
layer 42 = 47.04MB (freepbx source files needed for initial deploy)

that's all, these are all important and bigger layers, all remaining layers are very small, some KBs...

Now, let's look the development image:

Development image: dev-18 936.78 MB (compressed)

layer 1 = 71.7MB (centos8 base image size)
layer 34 = 384MB (os rpm packages and libraries in addition to the core image needed for "running" asterisk and freepbx, not cleaned the dnf cache dir)
layer 41 = 432.2MB (asterisk compiled binaries and not removed the /src/asterisk build dir and rpm devel packages)
layer 42 = 47.04MB (freepbx source files needed for initial deploy, here same size of production)

Anyway I can agree about one thing: the 219MB (uncompressed) overhead

Image name: docker.io/izdock/izpbx-asterisk:18.15.8
Total Image size: 1.5 GB                           
Potential wasted space: 219 MB                     
Image efficiency score: 89 %                       

using multi-stage build I can "potentially" "limit" the 219MB overhead, given by RPM database in the /var/lib/ that increase in every layer that rpm and dnf command will run (I run dnf upgrade against base image so overhead is inevitable), you can verify it with dive and hiding the unmodified files.

Anyway I tested with dive the production image, and, If I'm not wrong, no files are "removed/hidden" from the intermediate layers:

Screenshot_20210409_233748

do you agree?

I can only see a overhead caused by:

Screenshot_20210409_233936

Let's me know If you don't agree and think I'm wrong.

Thank you for the feedback.

eddyg commented 3 years ago

Yikes, my bad!

I didn't realize you were executing the installation of all the Asterisk build dependencies, building asterisk, and then removing the build the dependencies all in one gigantic RUN command! 😮

My apologies for not reading your Dockerfile more closely. I was just surprised at the size of the layers when I pull it and thought there was some bandwidth savings available.

Keep up the great work. 😃

ugoviti commented 3 years ago

ahahahah :) gigantic RUN is securely hard to maintain, multi-stage build will make easy to maintain the build process.

I saw two methods to create "relatively" small images:

  1. RUN command with nested sub commands + && etc... with delete temp/src/build/cache files before exiting the layer
  2. multi-stage build

multi-stage is good because basically it tell: install, build and run whatever you want in the build stage creating how many layers you want, but copy only the data that you need.

But some effort must be done before to switch izpbx into multi-stage:

  1. map all "installed" files (they aren't all in the same directory) and copy from build stage layer to the final image layer
  2. verify all os packages installed during build stage and reinstall them again into final stage
  3. avoid the overhead of dnf/rpm database dir (I must investigate further)

Using Alpine surely the global image size will be lower, but in the past I experienced many issue with muslc, so right now I'm avoiding it.

Switching to multi-stage is surely a good suggestion, I'll look into in the future.

Right now I'm valuating to switch from centos8 to ubi8 (RedHat Universal Base Image) because RedHat changed the CentOS8 EOL, but isn't so easy task, many centos packages (needed to use and build asterisk) are not available for ubi8.

Thank you

eddyg commented 3 years ago

No, thank YOU! I apologize for not looking more closely at the technique you used!

And yes, I agree about the challenges of switching to Alpine. A path fraught with peril. 😬

Again, my apologies. Impressive work maintaining such complex processes in single RUN commands!