mozilla / missioncontrol

Real-time monitoring of Firefox release health
Mozilla Public License 2.0
20 stars 18 forks source link

Use smaller docker images in development #128

Open edmorley opened 7 years ago

edmorley commented 7 years ago

Looking at docker-compose.yml I see a few images whose size could be reduced significantly by switching to slimmer variants. This would save significant amounts of time during the initial image pull, which will help both locally when starting fresh plus also in CI (where presumably the images are not cached). Smaller images may also result in faster container startup.

For example (all sizes are compressed sizes):

Sadly uhopper/hadoop-namenode and uhopper/hadoop-datanode (both 337MB compressed !!) don't have slimmer variants, plus I can see several mistakes in the upstream uhopper/hadoop Dockerfile that is bloating the image size.

Moving on, mozdata/docker-hive-metastore is a massive 500MB compressed, in part since it depends on the 337MB uhopper/hadoop image, but also because it similarly contains mistakes in its Dockerfile.

Finally, mozdata/docker-presto is a painful 872MB (compressed) - in part because of depending on a 254MB JDK base image, but also because of more missing cleanup in the Dockerfile.

For the last three, a few small three upstream changes will probably make a significant difference to image size.

wlach commented 7 years ago

@whd -- do you have thoughts on this? From what I remember the docker-compose setup is only used for testing, can we just go ahead and make these changes for some performance wins?

wlach commented 7 years ago

@edmorley -- also, could you give some more details on what needs to be fixed in those upstream dockerfiles?

edmorley commented 7 years ago

could you give some more details on what needs to be fixed in those upstream dockerfiles?

In general to produce the smallest Docker images, best practice is to:

For some nice examples of this, see:

Plus see: https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/

Looking at uhopper/hadoop-datanode and uhopper/hadoop-namenode, those Dockerfiles only create a directory and add a run script, so the size issue comes entirely from the base image: uhopper/hadoop. For that image there are several problems:

For mozdata/docker-hive-metastore, it also uses the above uhopper/hadoop base image (so will benefit from the above), however it should also really:

For mozdata/docker-presto, I would suggest:

I did have a quick look to see if there were any official Docker images for the above, or better third-party images but didn't really find much which was quite surprising (some of the other third-party images were horrendous - 2GB+, deleting files in the layer after they were created etc).

For further analysis, I'd try using the microbadger tool (it's a bit flaky but is still helpful), eg: https://microbadger.com/images/mozdata/docker-presto

edmorley commented 7 years ago

trying to reduce the duplicated file bloat in the presto archive

I've filed prestodb/presto#8904 to try and get upstream to fix this.

maurodoglio commented 7 years ago

@wlach correct, the images mentioned above are only used for local testing.

wlach commented 7 years ago

There are still some suggestions in here we could use, so I'm going to reopen for now.

gecube commented 6 years ago

Totally agree with the necessity of using slim images in testing and production environments.