squidfunk / mkdocs-material

Documentation that simply works
https://squidfunk.github.io/mkdocs-material/
MIT License
21.2k stars 3.57k forks source link

Docker lexer only rendering the first comment in a code block #7696

Closed joe-eklund closed 2 weeks ago

joe-eklund commented 2 weeks ago

Context

No response

Bug description

I am attempting to render comments for a Dockerfile and want to use the docker lexer in my code block. The following code block (note I used apostrophes to represent the backticks because I was having trouble getting backticks to render in the GitHub code block. I used actual backticks in my real code block):

''' docker title="Dockerfile"
FROM python # (1) hello there
COPY . /app # (2)
RUN make /app  # Execute a make command.
CMD python /app/app.py # Define the default command to run when the container runs.
'''

1. Annotation 1!
2. Annotation 2!

results in the following rendered code block:

Screenshot 2024-11-11 at 7 01 37 PM

You can see I am getting correctly colored text and the first line does have a comment, but afterwards no comments are correctly rendered. This also affects the ability to have additional annotations.

When I change to the yaml lexer like this:

''' yaml title="Dockerfile"
FROM python # (1) hello there
COPY . /app # (2)
RUN make /app  # Execute a make command.
CMD python /app/app.py # Define the default command to run when the container runs.
'''

1. Annotation 1!
2. Annotation 2!

I get a correctly rendered code block with both colored text and comments (with annotations):

Screenshot 2024-11-11 at 7 01 49 PM

I suspect the issue at play is rendering comments, but considering annotations also don't work (because later comments are not detected and they depend on that) and that was the reason I noticed it at all, I included that issue for completeness.

Related links

Reproduction

9.5.44-docker-lexer-comment.zip

Steps to reproduce

  1. Run reproduction.
  2. Navigate to http://localhost:8080/
  3. Visually inspect code blocks to see the bug described above.

Browser

No response

Before submitting

squidfunk commented 2 weeks ago

Thanks for reporting! This is not a problem with Material for MkDocs, but with the Docker lexer that is provided by Pygments. Additionally, it's not a problem with the line index, but it seems that comments are only created in line with FROM directives, but not for other directives. In any case, please take this upstream.

Bildschirm­foto 2024-11-12 um 09 22 49
joe-eklund commented 2 weeks ago

Hey thanks for pointing me in the right direction. As a follow-up, if anyone else runs into this, I looked through some issues in that project, and eventually was lead to the official docker docs that state:

BuildKit treats lines that begin with # as a comment, unless the line is a valid parser directive. A # marker anywhere else in a line is treated as an argument.

So I think the normal behavior is to actually ignore all those # comments that don't start at the beginning of a line, and the bug seems to actually be that it detects the first line comment (the FROM one), unless for some reason FROM is not considered a "valid parser directive." Either way, seems what I want to do is not supported. The only workaround for the docker lexer seems to have comments on their own line.

squidfunk commented 2 weeks ago

Thanks for investigating! That sounds logical. There are some languages like JSON that do not support comments, so using annotations is not straight forward. We also have the custom selectors for annotations feature in Insiders that adds some support, but that will not work for Docker, because (, 2 and ) are three distinct tokens, so sadly a dead end here.

kamilkrzyskow commented 2 weeks ago

Hello @joe-eklund, you could try modifying the DockerLexer with a hook, monkey-patch it, or create a CustomDockerLexer class, but note that there could be still performance issues with that due to caching, I monkey-patched the internals to fix the caching issue, and made a custom lexer in the past.

I asked ChatGPT for changes, it added a (#.*)? comment group everywhere. I haven't tested the solution, but it could be used as a base :v:

class DockerLexer(RegexLexer):
    """
    Lexer for Docker configuration files.
    """
    name = 'Docker'
    url = 'http://docker.io'
    aliases = ['docker', 'dockerfile']
    filenames = ['Dockerfile', '*.docker']
    mimetypes = ['text/x-dockerfile-config']
    version_added = '2.0'

    _keywords = (r'(?:MAINTAINER|EXPOSE|WORKDIR|USER|STOPSIGNAL)')
    _bash_keywords = (r'(?:RUN|CMD|ENTRYPOINT|ENV|ARG|LABEL|ADD|COPY)')
    _lb = r'(?:\s*\\?\s*)'  # dockerfile line break regex
    flags = re.IGNORECASE | re.MULTILINE

    tokens = {
        'root': [
            # Match full-line comments or inline comments after code
            (r'#.*', Comment.Single),
            (r'(FROM)([ \t]*)(\S*)([ \t]*)(?:(AS)([ \t]*)(\S*))?(#.*)?',
             bygroups(Keyword, Whitespace, String, Whitespace, Keyword, Whitespace, String, Comment.Single)),
            (rf'(ONBUILD)(\s+)({_lb})(#.*)?', bygroups(Keyword, Whitespace, using(BashLexer), Comment.Single)),
            (rf'(HEALTHCHECK)(\s+)(({_lb}--\w+=\w+{_lb})*)(#.*)?',
                bygroups(Keyword, Whitespace, using(BashLexer), Comment.Single)),
            (rf'(VOLUME|ENTRYPOINT|CMD|SHELL)(\s+)({_lb})(\[.*?\])(#.*)?',
                bygroups(Keyword, Whitespace, using(BashLexer), using(JsonLexer), Comment.Single)),
            (rf'(LABEL|ENV|ARG)(\s+)(({_lb}\w+=\w+{_lb})*)(#.*)?',
                bygroups(Keyword, Whitespace, using(BashLexer), Comment.Single)),
            (rf'({_keywords}|VOLUME)\b(\s+)(.*?)(#.*)?', bygroups(Keyword, Whitespace, String, Comment.Single)),
            (rf'({_bash_keywords})(\s+)(.*?)(#.*)?', bygroups(Keyword, Whitespace, using(BashLexer), Comment.Single)),
            # Match Bash command with optional inline comment
            (r'(.*\\\n)*.+?(#.*)?', bygroups(using(BashLexer), Comment.Single)),
        ]
    }