Dockerfile: unterminated heredoc could provide a hint

relates to https://github.com/moby/buildkit/issues/5240#issuecomment-2297287242

While writing the comment above, I noticed that the Dockerfile parsing properly detects unterminated here-documents, but doesn't provide a hint if it's invalid due to the end marker being indented. For example;

FROM alpine
RUN <<'EOT'
    env
    EOT

Building the above produces an error indicating that the here-document is not terminated;

[+] Building 0.1s (1/1) FINISHED                                                                                                                                                                                                                                             docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                         0.0s
 => => transferring dockerfile: 77B                                                                                                                                                                                                                                                          0.0s
Dockerfile:2
--------------------
   1 |     FROM alpine
   2 | >>> RUN <<'EOT'
   3 | >>>     env
   4 | >>>     EOT
   5 |
--------------------
ERROR: failed to solve: unterminated heredoc

Not all users may be aware of the requirements for the end-marker to be on position 0, so this can be a common mistake. In addition, in the example above, the indentation is quite visible, but may be much harder to find if (e.g.) it's only a single space and/or when printed as part of CI logs (which tend to indent output).

Initially, I thought it was "smart" enough to detect where it SHOULD be terminated, but from a quick test, it looks like it just marks "everything after" as part of the here-document. For example, adding some more instructions after, will include all of those in the error message;

FROM alpine
RUN <<'EOT'
    env
    EOT
RUN echo hello
RUN echo world
RUN echo foobar

[+] Building 0.1s (1/1) FINISHED                                                                                                                                                                                                                                             docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                         0.0s
 => => transferring dockerfile: 138B                                                                                                                                                                                                                                                         0.0s
Dockerfile:2
--------------------
   1 |     FROM alpine
   2 | >>> RUN <<'EOT'
   3 | >>>     env
   4 | >>>     EOT
   5 | >>> RUN echo hello
   6 | >>> RUN echo hello
   7 | >>> RUN echo world
   8 | >>> RUN echo foobar
   9 |
--------------------
ERROR: failed to solve: unterminated heredoc

☝️ while the above is correct, I wonder if we could have smarter handling of the above.

Suggested improvements

If possible, it would be great if we could detect potentially indented end-marker. I'm using "potentially" here, because the here-document itself could also use here-doc, e.g.

FROM alpine
RUN <<'EOT'
    cat > hello.txt <<EOT
        hello world
    EOT
    EOT
RUN echo hello

When parsing failed because no end-marker was found, the above information could be used for printing the error and to provide a more targeted solution;

Dockerfile:2
--------------------
   1 |     FROM alpine
   2 | >>> RUN <<'EOT'
   3 | >>>     cat > hello.txt <<EOT
   4 | >>>         hello world
   5 | >>>     EOT
   6 | >>>     EOT
   7 | RUN echo hello
   8 |
--------------------
ERROR: failed to solve: unterminated heredoc: end-marker at line 6 is indented.

Perhaps we could even omit intermediate lines (assuming here-docs can be long!) and point out the start and (expected) end.

Dockerfile:2
--------------------
   1  |     FROM alpine
   2  | >>> RUN <<'EOT'
   ...
   97 | >>>     EOT
   98 | RUN echo hello
   99 |
--------------------
ERROR: failed to solve: unterminated heredoc: heredoc starts at line 2 but the end-marker (EOT) at line 97 is indented.

moby / buildkit

Dockerfile: unterminated heredoc could provide a hint #5265

Suggested improvements