metanorma / metanorma-docker

Docker container for running the Metanorma toolchain
https://www.metanorma.com
5 stars 3 forks source link

Revert using Docker Engine's squash functionality #113

Open ronaldtse opened 3 years ago

ronaldtse commented 3 years ago

We only want to squash up to a certain layer to facilitate re-use of the base images, not squash all the way to the top.

phuonghuynh commented 2 years ago

@ronaldtse from the document, --squash is always create new layer by squashing all layers in the image. There is no option to select which layers need to squash for now,

https://docs.docker.com/engine/reference/commandline/build/#squash-an-images-layers---squash-experimental

Could we consider to use multi-stage build instead ?

ronaldtse commented 2 years ago

@phuonghuynh yes we should use a multi-stage build. We want to base our image "off" the existing base ubuntu container, i.e. we only want to squash the layers we build on top of the base Ubuntu container.

It was previously doable with our docker-squash image (https://github.com/riboseinc/docker-squash-container) but this may no longer be current.

phuonghuynh commented 2 years ago

@ronaldtse we already use it, https://github.com/metanorma/metanorma-docker/blob/master/Dockerfile.ubuntu.in#L4

So I think we will add new make command to let user select with or without --squash option

ronaldtse commented 2 years ago

@phuonghuynh the reason I want to have 2 stages squash is so that people who already have ubuntu do not need to download the full container. Can we do this?

phuonghuynh commented 2 years ago

Let me check

phuonghuynh commented 2 years ago

@phuonghuynh the reason I want to have 2 stages squash is so that people who already have ubuntu do not need to download the full container. Can we do this?

So people who already pull Ubuntu stage do not need to pull it again, docker client will only need to pull the cli stage?

ronaldtse commented 2 years ago

@phuonghuynh correct!

ronaldtse commented 2 years ago

Ping @phuonghuynh

phuonghuynh commented 2 years ago

Yes, I am on it

CAMOBAP commented 2 years ago

@ronaldtse as far as I understand this isn't valid anymore, because we don't squash images, right?

ronaldtse commented 2 years ago

The best practice approach now is to use multi-stage builds to just copy the necessary files. However this approach requires detailed testing.

Perhaps we should try another base image like distroless?

https://symflower.com/en/company/blog/2022/complete-guide-on-shrinking-container-images/

ronaldtse commented 2 years ago

Actually I finished reading the article and it does not provide much info that we don't already know. I think if we have tebako, then there is a "pre-expanded" mode, that would be the best.

Ping @maxirmx

maxirmx commented 2 years ago

This topic confuses me a lot: https://github.com/tamatebako/tebako/issues/74#issuecomment-1241885969 We can have pre-expanded mode, of course. But in this case we do not need DwarFS and all the pain it creates

ronaldtse commented 2 years ago

I'm not sure if I'm suggesting the right thing.

The issue here is that we have two ways of building a "package", as we have two deliverables: a tebako executable and a docker image.

The way we build the docker image seems to contain some files we don't need.

The way we build the tebako image seems more streamlined.

There are pros and cons of both, but in any case we need both deliverables.

If it is possible to streamline the build method of tebako to also generate a docker image, then it would simplify a lot our maintenance burden.

ronaldtse commented 2 years ago

Simply put, the base case is to just add the tebako executable in the docker image. Then we only build once and can calls create the docker image.

But since the docker environment has less constraints (it has a r/w FS, it is allowed to have a much larger size), we could try to optimize the space/execution time requirements for the docker environment.

maxirmx commented 2 years ago

I would suggest then that we

there will be single layer

ronaldtse commented 1 year ago

@maxirmx look forward to when we can do this... would you be able to help?

maxirmx commented 1 year ago

I believe that adding tebako executable to docker image will affect size ot speed. As shown below tebako will provide exactly the same bundle

Img-1

maxirmx commented 1 year ago

Another big layer installs jvm, python and inskape. Img-2

maxirmx commented 1 year ago

Please note also that image efficiency is estimated at 98%. So squashing the layers may potentially save 27Mb out of 1.5Gb. Download time will increase because of smaller concurrency factor