moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.26k stars 1.17k forks source link

Dockerfile: Mark ARG as required #3208

Open blowfishpro opened 2 years ago

blowfishpro commented 2 years ago

There doesn't seem to be a built-in way to mark an ARG as required (i.e. needing to be explicitly passed to the build every time). I've seen code like this a lot but it has some undesirable properties:

ARG SOME_ARG
RUN : "${SOME_ARG:?The build arg SOME_ARG must be specified}"

If it's run early in the build, the build cache is invidated for following layers if SOME_ARG changes, even if they don't depend on it.

If it's run late in the build (or run in a separate stage), then you can end up waiting a long time to get to the failure (alhough at least then the steps you waited for will be cached next time).

If there was some way to mark an ARG as required, that would make this scenario much easier. Something like:

ARG --required SOME_ARG

In that case the --required flag would have the following properties:

thaJeztah commented 2 years ago

I think this is a bug; support for required values (${word:?error}) was added in https://github.com/moby/buildkit/pull/1180, but it looks to only have effect on ENV, and while ARG doesn't complain about it, it seems to simply ignore it (perhaps it considers an ARG to be "defined" at that moment, but even in that case, it would (should) be considered null, which should error; https://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_06_02

Taking the following Dockerfile;

# syntax=docker/dockerfile:1

ARG VAR_ONE="${VAR_ONE:?The build arg VAR_ONE must be specified}"
ARG VAR_TWO="${VAR_TWO:?The build arg VAR_TWO must be specified}"

FROM alpine
ARG VAR_ONE
ARG VAR_TWO

ENV ENV_ONE="${VAR_ONE:?The build arg VAR_ONE must be specified}"
ENV ENV_TWO="${VAR_TWO:?The build arg VAR_TWO must be specified}"

RUN echo hello $VAR_ONE $VAR_TWO

The ARG cases are ignored, but the ENV correctly produces an error;

docker build --no-cache --progress=plain .
#6 [internal] load metadata for docker.io/library/alpine:latest
#6 DONE 1.6s
Dockerfile:10
--------------------
   8 |     ARG VAR_TWO
   9 |
  10 | >>> ENV ENV_ONE="${VAR_ONE:?The build arg VAR_ONE must be specified}"
  11 |     ENV ENV_TWO="${VAR_TWO:?The build arg VAR_TWO must be specified}"
  12 |
--------------------
ERROR: failed to solve: failed to process "\"${VAR_ONE:?The build arg VAR_ONE must be specified}\"": VAR_ONE: The build arg VAR_ONE must be specified
docker build --no-cache --progress=plain --build-arg VAR_ONE=world .

...
Dockerfile:11
--------------------
   9 |
  10 |     ENV ENV_ONE="${VAR_ONE:?The build arg VAR_ONE must be specified}"
  11 | >>> ENV ENV_TWO="${VAR_TWO:?The build arg VAR_TWO must be specified}"
  12 |
  13 |     RUN echo hello $VAR_ONE $VAR_TWO
--------------------
ERROR: failed to solve: failed to process "\"${VAR_TWO:?The build arg VAR_TWO must be specified}\"": VAR_TWO: The build arg VAR_TWO must be specified

Comparing with (ba)sh:

unset SOME_ARG

SOME_ARG="${SOME_ARG:?}"
bash: SOME_ARG: parameter null or not set

SOME_ARG="${SOME_ARG:?The build arg SOME_ARG must be specified}"
bash: SOME_ARG: The build arg SOME_ARG must be specified
SOME_ARG="hello"
SOME_ARG="${SOME_ARG:?The build arg SOME_ARG must be specified}"
thaJeztah commented 2 years ago

Right; so it looks to be a due to the quotes. With the quotes, it doesn't detect the variable being unset, but without quotes, parsing fails (incorrectly);

Taking this dockerfile:

# syntax=docker/dockerfile:1

ARG VAR_FIVE=default

FROM alpine AS base
ARG VAR_ONE
ARG VAR_TWO
ARG VAR_THREE
ARG VAR_FOUR
ARG VAR_FIVE
ARG CHECK=${VAR_ONE:?}
ARG CHECK=${VAR_TWO:?}
ARG VAR_THREE=${VAR_THREE:?}
ARG VAR_FOUR=${VAR_FOUR:?Missing}
ARG VAR_FIVE=${VAR_FIVE:?}
ARG VAR_SIX=${VAR_SIX:?Multi-word error without quotes}

FROM base AS foo

FROM base AS bar

Things mostly work, until VAR_SIX, which breaks on the error message (spaces)

docker build --no-cache --progress=plain .
docker build --no-cache --progress=plain --build-arg=VAR_ONE=y .
docker build --no-cache --progress=plain --build-arg=VAR_ONE=y --build-arg=VAR_TWO=y .
docker build --no-cache --progress=plain --build-arg=VAR_ONE=y --build-arg=VAR_TWO=y --build-arg=VAR_THREE=y .
docker build --no-cache --progress=plain --build-arg=VAR_ONE=y --build-arg=VAR_TWO=y --build-arg=VAR_THREE=y --build-arg=VAR_FOUR=y .

The Dockerfile requires quotes (for the error), but probably shouldn't;

When using a Dockerfile:

ARG VAR_SIX=${VAR_SIX:?Multi-word error without quotes}
ERROR: failed to solve: failed to process "${VAR_SIX:?Multi-word": syntax error: missing '}'

But in a shell:

VAR_SIX=${VAR_SIX:?Multi-word error without quotes}
bash: VAR_SIX: Multi-word error without quotes

Adding quotes around it makes the error work;

ARG VAR_SIX="${VAR_SIX:?Multi-word error with quotes}"
docker build --no-cache --progress=plain --build-arg=VAR_ONE=y --build-arg=VAR_TWO=y --build-arg=VAR_THREE=y --build-arg=VAR_FOUR=y .

Dockerfile:16
--------------------
  14 |     ARG VAR_FOUR=${VAR_FOUR:?Missing}
  15 |     ARG VAR_FIVE=${VAR_FIVE:?}
  16 | >>> ARG VAR_SIX="${VAR_SIX:?Multi-word error with quotes}"
  17 |
  18 |     FROM base AS foo
--------------------
ERROR: failed to solve: failed to process "\"${VAR_SIX:?Multi-word error without quotes}\"": VAR_SIX: Multi-word error without quotes

But ALSO fails do detect when it's set:

docker build --no-cache --progress=plain --build-arg=VAR_ONE=y --build-arg=VAR_TWO=y --build-arg=VAR_THREE=y --build-arg=VAR_FOUR=y --build-arg=VAR_SIX=y .

Dockerfile:16
--------------------
  14 |     ARG VAR_FOUR=${VAR_FOUR:?Missing}
  15 |     ARG VAR_FIVE=${VAR_FIVE:?}
  16 | >>> ARG VAR_SIX="${VAR_SIX:?Multi-word error with quotes}"
  17 |
  18 |     FROM base AS foo
--------------------
ERROR: failed to solve: failed to process "\"${VAR_SIX:?Multi-word error without quotes}\"": VAR_SIX: Multi-word error without quotes

Trying to put the quotes around the error message itself, also doesn't solve it;

ARG VAR_SIX=${VAR_SIX:?"Multi-word error with quotes2"}
ERROR: failed to solve: failed to process "${VAR_SIX:?\"Multi-word error with quotes2\"}": VAR_SIX: Multi-word error with quotes2
thaJeztah commented 2 years ago

So.. mistake in my example, but interesting; it's not the quotes; it's if there's no ARG NAME_OF_VAR before using the "required arguments", then it's failing.

See this Dockerfile:

# syntax=docker/dockerfile:1

FROM scratch AS base
ARG VAR_SIX
ARG VAR_SEVEN
ARG VAR_EIGHT
ARG VAR_NINE
ARG VAR_TEN
# no VAR_ELEVEN arg here

ARG CHECK=${VAR_SIX:?"Multi-word error with quotes2"}
ARG CHECK=${VAR_SEVEN:?"Multi-word error with quotes2"}
ARG CHECK=${VAR_EIGHT:?"Multi-word error with quotes2"}
ARG CHECK="${VAR_NINE:?Multi-word error with quotes2}"
ARG CHECK="${VAR_TEN:?Multi-word error with quotes2}"
ARG CHECK="${VAR_ELEVEN:?Multi-word error with quotes2}"

Above fails on VAR_ELEVEN:

docker build --no-cache --progress=plain --build-arg=VAR_SIX=y --build-arg=VAR_SEVEN=y --build-arg=VAR_EIGHT=y  --build-arg=VAR_NINE=y  --build-arg=VAR_TEN=y  --build-arg=VAR_ELEVEN=y .

But all intermediate steps work (when omitting the --build-arg).

tonistiigi commented 2 years ago

@thaJeztah The initial report was not about variable interpolation of defining ARG

thaJeztah commented 2 years ago

I see the report describes;

There doesn't seem to be a built-in way to mark an ARG as required (i.e. needing to be explicitly passed to the build every time).

And I think there is but (see above) it's broken.

If it would work, I could start with;

# syntax=docker/dockerfile:1

ARG REQUIRED_ARG1=${REQUIRED_ARG1:?}
ARG REQUIRED_ARG2=${REQUIRED_ARG2:?}

FROM ubuntu AS foo

FROM alpine AS bar

And the Dockerfile would error out before any changes were made (and I assume no build-cache affected)?

blowfishpro commented 2 years ago

Interesting, I couldn't find any reference to that syntax in the Dockerfile reference. But if it's broken anyway then maybe that and the documentation are separate issues?

blowfishpro commented 2 years ago

So messing around this has some of the right behavior:

ARG EXAMPLE

FROM busybox:latest

ARG EXAMPLE
ARG _=${EXAMPLE:?"Must provide EXAMPLE"}

RUN true
RUN echo ${EXAMPLE}

It correctly fails when EXAMPLE is not provided as a build arg, and succeeds when it is. But from changing the value of EXAMPLE it looks like that invalidates the cache for the RUN true step, even though there's no way for that to have an effect yet.

thaJeztah commented 2 years ago

Well, it's partially broken; I think the original PR focused on using this for "ENV", and so within a stage (not in the global scope). The parsing of non-quoted errors should also be addressed (but can be worked around, and is merely an issue I discovered while trying out under what conditions it failed (or passed).

I think what should be looked at to make this a viable option;

The cache invalidation in your example surprises me as well, but I must confess that I'm not very familiar with the "internals" of BuildKit, so perhaps can be explained somehow.

What I was hoping was that being able to use this feature in the global scope ( before the first stage) to allow for an early fail (without affecting the cache), but perhaps someone with more knowledge about the internals can tell if that would indeed be possible.

blowfishpro commented 7 months ago

Well I discovered a hacky way to do this that doesn't invalidate build caches:

ARG EXAMPLE

FROM scratch AS validate-example

ARG EXAMPLE
ARG _=${EXAMPLE:?"Must provide EXAMPLE"}

FROM busybox:latest

RUN true

ARG EXAMPLE

RUN --mount=type=bind,from=validate-example,target=/run/validated-example echo ${EXAMPLE}

The separate validation build stage ensures that the validations are run early in the build, while the bind mount puts those validations into the dependency tree at the right place.

So far I haven't found any way to validate ARGs that are used in a FROM instruction (e.g. FROM busybox:${VERSION} ... it errors on an invalid reference format)