moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.01k stars 1.12k forks source link

Add an UNVOLUME or UNSET VOLUME statement to cancaled mount the default parent volume #4802

Open alexeyp0708 opened 5 months ago

alexeyp0708 commented 5 months ago

This topic has been before discussed since 2014:

Created a new topic based on recommendations https://github.com/moby/moby/pull/8177#issuecomment-2022998013

Target: Add UNVOLUME /target/directory (or UNSET VOLUME) statement in Dockerfile. Behavior: Removes volume metadata that is added by the parent image. If there is no metadata to delete, a warning is issued. Mounting and unmounting volumes must be applied sequentially according to the image inheritance order.

The problem exists and people suffer from it and ask for something to be done.

Reasons for change:

  1. The child image does not always need the volumes of the parent image.
  2. When changing daemon settings of parent image, developers sometimes prefer to specify their own volumes for worker processes (For example, specify a directory in a shared volume for containers.) . As a result, parent volumes hang like extra garbage and take up disk space (often data from the container is copied to volumes by default).
  3. The developer would prefer to have access to the folder in the container that the volume overlaps. But the default volume always overlaps this folder in the container and the user does not have access to it.

Please study the topic https://github.com/moby/moby/issues/3465 and you will understand that there are problems. It’s just that not everyone can or wants to describe the situation that the developers encountered.

I think there are enough legal grounds to consider this problem seriously and include any option for canceling the parent default volume.

Some links: https://github.com/moby/moby/issues/33842#issuecomment-1776786694 - VOLUME behavior in buildkit - VOLUME is now metadata https://github.com/moby/moby/issues/3465#issuecomment-2018567264 - unmount volume via ENTRYPOINT

https://github.com/moby/moby/issues/3465#issuecomment-313389014 - Rebuilding the image with new settings

https://github.com/moby/moby/issues/3465#issuecomment-383416201 - Changing metadata in a built image

(UPDATED) I have a request to the participants - reproduce your problems in detail. If possible with examples. It is better not to discuss unless necessary, so as not to clutter the topic.

alexeyp0708 commented 5 months ago

I can suggest a design option VOLUME ["!/volume/directory"] where the exclamation mark (" ! ") disables mounting of the specified parent volume. https://github.com/moby/moby/issues/3465#issuecomment-2014035282

tonistiigi commented 5 months ago

Maybe smth like

FROM --reset=cmd,entrypoint,volume,env,onbuild,workdir,expose,all img
felipecrs commented 5 months ago

Maybe smth like

FROM --reset=cmd,entrypoint,volume,env,onbuild,workdir,all img

I think it looks great! Maybe it can also support an extended syntax like to allow filtering which specific volume and env to reset:

FROM --reset=volume=/var/lib/docker,env=MYVAR,env=MYOTHERVAR img

On a note, CMD and ENTRYPOINT can already be reset with CMD [] and ENTRYPOINT [].

tonistiigi commented 5 months ago

like to allow filtering which specific

I'm more thinking it from the perspective of "I don't want let changes in the base image control this part of my build". If I already know a specific volume or other parameter by name that I want to keep then I can reset all and re-add missing manually.

Otoh. the extended syntax does not look too awful, but I think individually it makes more sense to for example define env that I want to keep, rather than the ones I want to unset.

alexeyp0708 commented 5 months ago

Maybe smth like

FROM --reset=cmd,entrypoint,volume,env,onbuild,workdir,all img

I think this will require a lot of refactoring. Moreover, a large-scale reset will affect the details. Let's imagine that you have 5 levels of inheritance. You will simply break everything with such a reset.

If you reset Environment Variables then you can influence the parent entry point through a script file. You can reset variables via. RUN unset ENV_VAR

(updated) or ENV ENV_VAR= or ENTRYPOINT "unset ENV_VAR"

ENTRYPOINT or CMD you always can to override WORKDIR can also be overridden . And resetting WORKDIR is redundant. https://docs.docker.com/reference/dockerfile/#workdir

If the WORKDIR doesn't exist, it will be created even if it's not used in any subsequent Dockerfile instruction.

We are now discussing VOLUME here because parent volumes create difficulties. All you offer is syntactic sugar.

felipecrs commented 5 months ago

You can reset variables via. RUN unset ENV_VAR

Really? I don't think so.

tonistiigi commented 5 months ago

You can reset variables via. RUN unset ENV_VAR

No, that is not how it works. That only resets the variable for your current running process inside a container.

(updated) or ENV ENV_VAR=

That isn't exactly equivalent. The environment variable would still be in the list of defined variables but the value would be empty.

We are now discussing VOLUME here because parent volumes create difficulties.

The problem with the older attempts with this seems to be more that there wasn't a solution that fixed the general problem (or that the general problem was not well-defined) and nobody wanted to add extra hacky one-off command, that patched one case but left many open questions to be figured out later. Adding this kind of reset functionality to one command or to all applicable commands doesn't necessarily mean a big difference in development complexity.

Two things have changed since that old discussion in earlier issues:

So from my side, I would be open to adding FROM --reset from https://github.com/moby/buildkit/issues/4802#issuecomment-2023362058 that I think provides the solution for controlling metadata set by the base image, but as we already have many examples of build flags I don't mind defining a way for resetting them individually by name as well.

In that case I think the syntax that best aligns with existing commands would be.

ENV --unset FOO
ARG --unset FOO
VOLUME --unset /foo/bar
LABEL --unset my.label

These are the commands that allow defining named keys. Other commands should not need --unset. Compared to FROM --reset we could make a case that sometimes you may want to define a ENV/ARG only for a specific RUN command and then remove it. This does not apply to VOLUME/LABEL, though, as they only have an effect on the resulting image - these are added just for consistency.

My take is that both of these could be added to the labs channel https://docs.docker.com/build/dockerfile/frontend/#labs-channel

alexeyp0708 commented 5 months ago

In that case I think the syntax that best aligns with existing commands would be. ENV --unset FOO ARG --unset FOO VOLUME --unset /foo/bar LABEL --unset my.label

This syntax would be a good idea

So from my side, I would be open to adding FROM --reset from https://github.com/moby/buildkit/issues/4802#issuecomment-2023362058

A global reset will have consequences and will therefore be used extremely rarely.

At build time, the parent image needs the environment variables that it declares, since the entrypoint startup script depends on this.

unset all env I assume that developers only need to unset variables after executing the entrypoint script.

You can reset variables via. RUN unset ENV_VAR

No, that is not how it works. That only resets the variable for your current running process inside a container.

(updated) or ENV ENV_VAR=

That isn't exactly equivalent. The environment variable would still be in the list of defined variables but the value would be empty.

I explained this earlier, but deleted some of my comments so as not to clutter up the topic.

How I do it ENV ENV_VAR= In case the value of the variable is not important and scripts can process empty values

RUN unset ENV_VAR ; my_scripts - If during the build this variable bothers me

ENTRYPOINT 'entrypoint_parent.sh&; unset var; jobs;' - I f I need to remove a variable after the entry point. This is shown as an example only, This solution is only in theory. In my case, I usually write my entry point script. But all this will not globally unset the variables because I will have access to them via docker exec

If people care about the need to reset variables after the entry point, then I think you can implement this:

entrypoint --after_clear_env

or

entrypoint --reset  env, volume ....

In this case, when resetting the environment variables It is recommended to reset them after installing them docker run -e MY_VAR and execution of the entrypoint script, To reset volumes, it is recommended to only reset the default volumes (those installed previously by the Dockerfile)

But unfortunately in this case you will have to redefine the parent ENTRYPOINT. The solution is a separate directive, for example CLEAR or RESET which will be affter applied when ENTRYPOINT is executed - CLEAR env, value or shift the responsibility to CMD CMD --reset 'env,volume ' optinons_entrypoint . But in this case, the CMD override should not reset the parent ENTRYPOINT. (As far as I know, if you try to override CMD, the parent’s ENTRYPOINT will be reset)

alexeyp0708 commented 4 months ago

By the way . Sometimes there is a lack of a directive that would be executed after the entrypoint is runed. But these are my wishes.

Example : Running the script every time after the script entrypoint is runned CMD --after_entrypoint 'myscripts' CMD --after_entrypoint --security=insecure 'myscripts' The equivalent of the command is docker exec id my_script

As an addition, you can implement this Running the script every time before the script entry point is running CMD --before_entrypoint 'myscripts' CMD --before_entrypoint --security=insecure 'myscripts'

Today I am forced to write my entry point and override the parent entry point

Why is the command CMD --before_entrypoint and not RUN ? - because when running RUN, the volumes are not mounted (docker run --mount), the network is not connected(docker run --network my_net), command variables are not declared (docker run -e my_env)

Rewriting the entrypoint that is declared by the parent is a crutch that interferes with life. It’s easier to declare/extension scripts that will be executed before the entrypoint and after the entrypoint. Then in most cases there is no need to override the parent entrypoint. (Open/Closed concept - SOLID) Extension through inheritance implementing the stack structure (Last to come, first to runned) for --before_entrypoint and implementation of the queue structure (Last to come, last to runned) for --after_entrypoint Such scripts need a separate variable environment through which they can communicate (A script before an entry point can affect entry point variables, and a script after an entry point can change entry point variables.).

Previously I expressed my thoughts about privileges https://github.com/moby/moby/issues/47632 As part of the development concept, I can duplicate it in mody/buildkit

Such ideas would allow developers to remove restrictions and inconveniences.

But I don’t want to once again flood with new topics, so I expressed my ideas here.

alexeyp0708 commented 4 months ago

As a future strategy, I would recommend idea defining the principles of Covariance and contravariance in inheritance. Covariance - This is when image A can be replaced by image B (extends A), where the behavior of image A is preserved. Contravariance is when image B (extends A), preserved the behavior of image A . It can be overridden by image A.

For example, when implementing this behavior, the parent image has the right to frozen some directives (ENTRYPOINT --frozen ENV --frozen VAR .... COPY --frozen) which guarantees that everything that is frozen cannot be overridden or unset by the child image. In this case, the developer will spend less effort replacing containers in the future, without additional compatibility checks. In the same concept, you can implement the concept of abstraction (ENV --required VAR COPY --required /path/file ENTRYPOINT --required ... , RUN --required script_check or RUN --defer_after_event event_name='description child run' script_this_run_for_event For child RUN(or COPY) --event=event_name script_child). Where the parent image (or running a container ) will require a certain child implementation. Abstraction will allow you to write your own interfaces. And to determine compatibility, then declare a additional directive INTERFACE imageA, subImageB, abstractImage - Which declares that this image is compatible for replacing images imageA and subImageB and implements the interface abstractImage.
I think for this will have to introduce a strict mode (USE STRICT or USE STRICT INHERIT MODE) in which there will be strict expansion rules.

I understand that there are many factors against this. But this is my idea that you can take as perspective.

thorfi commented 2 months ago

I have a workaround for this, as an example, the official postgres image. Originally posted by me here: https://stackoverflow.com/a/78645904/2545063

And also here: https://github.com/moby/moby/issues/3465#issuecomment-2179958180

You can use COPY --from to copy the files from the image you want to work with but don't want its properties (like VOLUME) to an image based on the same base image. As an example, the postgres image and its source Debian image.

FROM postgres:15-bookworm as postgres-docker

FROM debian:bookworm-slim as postgres-build

# We do this because the postgres-docker container has a VOLUME on /var/lib/postgresql/data
# This copies the filesystem without bringing the VOLUME with it.
COPY --from=postgres-docker / /

# DO THE REST OF YOUR STUFF

# https://hub.docker.com/_/postgres - see 15-bookworm for the stuff we are setting here
# Note: We need to set these because we are using COPY --from=postgres-docker - see above
ENTRYPOINT ["docker-entrypoint.sh"]
ENV LANG en_US.utf8
ENV PG_MAJOR 15
ENV PATH $PATH:/usr/lib/postgresql/${PG_MAJOR}/bin
ENV PGDATA /var/lib/postgresql/data
ENV SLEEP_ON_EXIT=0
STOPSIGNAL SIGINT
CMD ["postgres"]

EXPOSE 5432

That said, I would MUCH prefer to be able to do:

UNVOLUME /var/lib/postgresql/data
VOLUME /var/lib/postgresql

instead of having to do COPY --from and then manually reset all the other build instructions.

Someone wrote a script to do this sort of thing via image dump -> patch -> load https://stackoverflow.com/a/52030840/2545063 but frankly that's also a terrible workaround (it works, but ugh).

Excpt0r commented 1 month ago

I really appreciate the suggestion from @tonistiigi (also the syntax to use KEYWORD --unset). This is the only clean solution I see.

To provide another real-world example: The apache-kafka image defines VOLUME ["/etc/kafka/secrets", "/var/lib/kafka/data", "/mnt/shared/config"], but it is not mandatory to provide secrets or config (can be done by environment vars too). So I provide only a data volume, which leads to the situation that docker creates anonymous volumes for secrets and config. Deleting and recreating the kafka container (keeping the "data" volume) will lead to newly created anonymous volumes, so they pile up.

Regarding the workaround provided by @thorfi, be aware that this approach increases the size of your resulting image. In this case by the size of "debian:bookworm-slim", in my case of kafka it was even more. Of course you can use "FROM scratch" as base to avoid that, but then have to re-create the meta data (like LABEL and ENV) from all upstream images too. In my case this would be alpine linux, eclipse temurin and kafka - not maintainable.