universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.39k stars 618 forks source link

Containerfile/Dockerfile parser #3970

Open westurner opened 3 months ago

westurner commented 3 months ago

STORY: Users can parse Dockerfile and Containerfile with universal-ctags in order to navigate and review with tool support.

masatake commented 3 months ago

I have not read this issue well yet. Here is .ctags I wrote ago:

#
# containerfile.ctags --- regex parser for Containerfile and Dockerfile
#
#  Copyright (c) 2023, Red Hat, Inc.
#  Copyright (c) 2023, Masatake YAMATO
#
#  Author: Masatake YAMATO <yamato@redhat.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
# Reference: https://docs.docker.com/engine/reference/builder/
# 
--langdef=Containerfile
--map-Containerfile=+(Containerfile)
--map-Containerfile=+(Dockerfile)

--kinddef-Containerfile=a,arg,arguments
--kinddef-Containerfile=e,env,envorment variables
--kinddef-Containerfile=i,image,images
--_roledef-Containerfile.{image}=from,specfied in FROM

--regex-Containerfile=/^ARG[[:space:]]+([^[:space:]=]+)/\1/a/{exclusive}
--regex-Containerfile=/^ENV[[:space:]]+([^[:space:]=]+)/\1/e/{exclusive}
--regex-Containerfile=/^FROM[[:space:]]+(--[^[:space:]]*)?[[:space:]]+([^[:space:]]+)([[:space:]]+(as|AS)[[:space:]]+([^[:space:]]+))?//{exclusive}{{
    \2 /image /from _reftag _commit pop
    \5 false ne{
        \5 /image _tag _commit \2 inherits:
    } if
}}

This can be used as a start point.

masatake commented 3 months ago

The main task of ctags is to extract names newly introduced in a target file. ctags extracts such names as definition tags. Though we have extended the task to extract names referenced or used, extracting definition tags is a higher priority.

I don' think RUN introduces a new name.

As far as reading https://www.tohoho-web.com/docker/dockerfile.html (Japanese), LABEL introduces names. So, the .ctags file should support it.

The critical issue is the .ctags doesn't support a command with multiple lines like:

ENV DB_HOST="192.168.2.201" \
    DB_PORT="3306" \
    DB_USER="myapp" \
    DB_PASSWD="ZbGc7#adG87GBfVC" \
    DB_DATABASE="sample"

To extract DB_PORT, DB_USER, ..., we must switch the multi-table meta parser (https://docs.ctags.io/en/latest/optlib.html#advanced-pattern-matching-with-multiple-regex-tables) from the line-oriented meta parser.

In my experience, A Containerfile is not very large. The performance of the parser may not be important, so a regex-based optlib parser is enough for the purpose.

Do you want to implement such a parser by yourself? I don't want to intend to take your joyful hacking time:-P

westurner commented 3 months ago

Thanks. I can't commit to owning a parser like this; but here's this for parsing from https://docs.docker.com/reference/dockerfile/ :

$$('article table:first-of-type tr code').map((el) => el.innerText).reduce((a,b) => a + "\n" + b)
"ADD
ARG
CMD
COPY
ENTRYPOINT
ENV
EXPOSE
FROM
HEALTHCHECK
LABEL
MAINTAINER
ONBUILD
RUN
SHELL
STOPSIGNAL
USER
VOLUME
WORKDIR"
masatake commented 3 months ago

I don't understand why you want to show all the commands. Ctags is not a general navigation tool. It focuses on definitions. We need a list of all commands that define names or introduce NEW names.

masatake commented 3 months ago
#
# containerfile.ctags --- regex parser for Containerfile and Dockerfile
#
#  Copyright (c) 2023, 2024, Red Hat, Inc.
#  Copyright (c) 2023, 2024, Masatake YAMATO
#
#  Author: Masatake YAMATO <yamato@redhat.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
# Reference: https://docs.docker.com/engine/reference/builder/
# 
--langdef=Containerfile
--map-Containerfile=+(Containerfile)
--map-Containerfile=+(Dockerfile)

--kinddef-Containerfile=a,arg,arguments
--kinddef-Containerfile=e,env,envorment variables
--kinddef-Containerfile=i,image,images
--_roledef-Containerfile.{image}=from,specfied in FROM
--kinddef-Containerfile=l,label,labels

--_tabledef-Containerfile=main
--_tabledef-Containerfile=skipComment
--_tabledef-Containerfile=next
--_tabledef-Containerfile=arg
--_tabledef-Containerfile=env
--_tabledef-Containerfile=label

--_mtable-regex-Containerfile=skipComment/#[^\n]*//

--_mtable-regex-Containerfile=next/\\\n//{tleave}
--_mtable-regex-Containerfile=next/\n//{tleave}{_advanceTo=0start}
--_mtable-regex-Containerfile=next/[^\\\n]+//

--_mtable-extend-Containerfile=main+skipComment
--_mtable-regex-Containerfile=main/(ARG[ \t]+(\\\n)?|ARG\\\n)//{tenter=env}
--_mtable-regex-Containerfile=main/(ENV[ \t]+(\\\n)?|ENV\\\n)//{tenter=env}
--_mtable-regex-Containerfile=main/(LABEL[ \t]+(\\\n)?|LABEL\\\n)//{tenter=label}
--_mtable-regex-Containerfile=main/FROM[ \t]+(--[^ \t]*[ \t]+)?([^ \t\n]+)([  t]+(as|AS)[  \t]+([^ \t\n]+))?//{{
     \2 /image /from @2 _reftag _commit pop
     \5 false ne{
         \5 /image @5 _tag _commit \2 inherits:
     } if
}}
--_mtable-regex-Containerfile=main/[^\n]+//
--_mtable-regex-Containerfile=main/.//

--_mtable-regex-Containerfile=arg/[ \t]+//
--_mtable-regex-Containerfile=arg/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=arg/\n//{tleave}
--_mtable-regex-Containerfile=env/[ \t]+//
--_mtable-regex-Containerfile=env/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=env/\n//{tleave}
--_mtable-regex-Containerfile=label/[ \t]+//
--_mtable-regex-Containerfile=label/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=label/\n//{tleave}
westurner commented 3 months ago

My use case for [universal-]ctags (#354) is vim-tagbar, which:

Tagbar is a Vim plugin that provides an easy way to browse the tags of the current file and get an overview of its structure. It does this by creating a sidebar that displays the ctags-generated tags of the current file, ordered by their scope. This means that for example methods in C++ are displayed under the class they are defined in.

(FWIW where tagbar doesn't get it, vim-voom [2] has Markdown and RST outline editing. I still have a custom config, but e.g. SpaceVim [3] has TagBar installed too)

[1] https://github.com/preservim/tagbar [2] https://github.com/vim-voom/VOoM/blob/master/doc/voom.txt [3] https://spacevim.org/use-vim-as-ide/

So IDK if just all of the tokens are worth indexing for Containerfile. RUN and ENTRYPOINT and HEALTHCHECK are probably significant enough tokens in the file to be useful for navigation with tagbar and similar for e.g. vscode.

westurner commented 3 months ago

Buildah (Apache 2.0) has many Containerfile test cases:

westurner commented 3 months ago

jupyter-docker-stacks Dockerfiles aren't that long because they extend FROM other Dockerfile, but as far as demonstrating the utility of tagbar+ctags with a useful Dockerfile, there's docker-stacks-foundation/Dockerfile which specifies the e.g. NB_USER arg and so on: https://jupyter-docker-stacks.readthedocs.io/en/latest/ https://github.com/jupyter/docker-stacks/blob/main/images/docker-stacks-foundation/Dockerfile

What's a better example of a gnarly Dockerfile where this functionality will be helpful?

masatake commented 3 months ago

Regarding languages for Documentation, we violate the principle of "making a tag for definition."

However, about Cotainerfile/Dockerfile, I want to uphold the principle. If I introduce a parser for the languages, the parser may only extract definitions. If you want to make tags for objects other than definitions, extend the built-in parser with ---regex-Containerfile=... options in your .ctags.

https://github.com/containers/buildah/blob/main/tests/bud/multi-stage-builds/Dockerfile.extended

This Dockerfile is quite a good example. Thank you.

I am surprised at ENV "BUILD_LOGLEVEL"="5". The left-side variable is surrounded by double-quote characters.

FROM is used more than once. An image name specified at FROM/AS is a scope for ENV, ARG, and LABEL. If only FROM is used, the parser must generate an image name to fill the scope fields of ENV, ARG, and LABEL.

https://docs.podman.io/en/stable/markdown/podman-build.1.html

Podman-build runs CPP. Therefore, #define DEF may appear in a Container file. ctags should extract DEF as a CPP macro.

Can we satisfy these requirements with .ctags? To get the answer to this question, I will implement the parser by myself.

westurner commented 3 months ago

Isn't it possible to ~ distill such a grammar from a number of examples, such as the already-reference buildah and docker container builder test cases? Podman builds containers with Buildah. Nerdctl and Docker > 23.0 build containers with BuildKit.

BuildKit; where are the Dockerfile syntax examples tested by BuildKit?:

Buildah's test Dockerfiles appear to be the most complete set of test Dockerfiles / Containerfiles I'm aware of.

westurner commented 2 months ago

Weeks later, FWIW, there's probably already regex-based syntax highlighting for Dockerfile

hholst80 commented 1 month ago

It focuses on definitions. We need a list of all commands that define names or introduce NEW names.

FROM X AS Y is what you want to consider. There are implicit numerical names given.

I do not see a great value in trying to add references from commands like COPY, ADD, MOUNT, RUN to other layers. Having a way to navigate between FROM is more than enough. Tags support for Dockerfile seems fairly useless imho and just adds complexity to the tooling with few real use cases.

Example


FROM alpine:latest

# RUN ..

FROM debian:latest

# Here we copy from stage 0 (not given a symbolic name automatic name is 0)
COPY --from=0 /build/artifact /usr/local/bin
ENTRYPOINT ["/usr/local/bin/artifact"]