martinvonz / jj

A Git-compatible VCS that is both simple and powerful
https://martinvonz.github.io/jj/
Apache License 2.0
8.34k stars 285 forks source link

FR: Support configuration of multiple tools for `jj fix` #3801

Open hooper opened 4 months ago

hooper commented 4 months ago

Problem:

As of https://github.com/martinvonz/jj/pull/3698, jj fix can only be configured to run different code formatters on different file types if a wrapper script is used to make such decisions. That is an unreasonable requirement for anyone who didn't already have such a wrapper script, and it means that jj fix is unable to make any useful decisions about which files even need to be processed at all. Even users of a wrapper script might want to avoid processing some files.

Solution:

Support a repeatable configuration syntax for defining the properties of each tool:

As @PhilipMetzger mentioned, TOML tables seem to be a good approach: https://github.com/martinvonz/jj/pull/3698#issuecomment-2118092548

Alternatives:

You could imagine configuring some other source of information, from which jj fix would fetch the tool properties. I don't think that would be useful, although we could look around to see if there is some existing standard for declaring this kind of thing.

You could imagine deferring formatting to something like LSP, but that would require more setup, would be less generic, would encounter more complicated failure modes, and would share some of the limitations of wrapper scripts as described above: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_formatting

Context:

Prior art: https://www.mercurial-scm.org/doc/hg.1.html#fix

I specifically want to avoid re-implementing the :metadata subconfig from hg fix, because it's weird and probably unnecessary. I say this as the person who implemented it and oversaw its use at Google; I don't know if others have found it useful.

arxanas commented 4 months ago

jj fix can only be configured to run different code formatters on different file types if a wrapper script is used to make such decisions

If one is not willing to implement a wrapper script, is it really a problem to just run multiple serial jj fix invocations with the different tools?

I don't know if the listed "solution" has a lot to do with the listed "problem", anyways. The listed items largely seem like general configuration options.

hooper commented 3 months ago

My expectation is that users will generally run jj fix without arguments, relying on a configuration of multiple tools that they rarely interact with, and that multiple invocations for different tools would be cumbersome. That's mostly based on good feedback about hg fix working that way.

I think the key word in the solution was "repeatable". I described the main configuration options just to make sure those ideas are documented. They're generally borrowed from hg fix, but not set in stone. The "affected file pattern" is the one most related to the idea of having multiple instances in the configuration. Imagine a config with several formatters relevant to a repo:

hooper commented 3 months ago

Here's a concrete proposal for the configuration syntax.

A simple example for a single code formatter:

[[fix.clang-format]]
command = ["/usr/bin/clang-format", "-", "--assume-filename=$path"]
patterns = ["**.cc", "**.h"]

This would result in argv like the following, which would vary per ToolInput due to the $path substitution:

/usr/bin/clang-format - --assume-filename=src/lib/foo.cc

The patterns will be handled by parse_file_patterns() to determine what we do with each element of unique_tool_inputs during fix_file_ids().

Multiple tools can be defined by repeating the section with unique identifiers, and their priority is undefined. Until we implement the priority option, we can operate under the assumption that the tools are run in the order they are specified.

[[fix.cpp]]
command = ["/usr/bin/clang-format", "-", "--assume-filename=$path"]
patterns = ["**.cc", "**.h"]

[[fix.python]]
command = ["/usr/bin/black", "--stdin-filename=$path"]
patterns = ["**.py"]

We would like to implement that basic syntax this week if there are no objections. Some other functionality can be added incrementally.

When https://github.com/martinvonz/jj/issues/3802 is implemented, we will need to support communicating the boundaries of the diff hunks using repeated flags. We see two main approaches. The first one puts the repeated flags in the same list with the other arguments, and implicitly repeats them at the same position in the argument list. The presence of substitutions for the line range variables determines which arguments are handled this way. For example:

[[fix.clang-format]]
command = ["/usr/bin/clang-format", "-", "--lines=$first:$last", "--assume-filename=$path"]
patterns = ["**.cc", "**.h"]

This would result in an argv like the following:

/usr/bin/clang-format - \
--lines=12:34 --lines=56:78 \
--assume-filename=src/lib/foo.cc

Another approach, which is used by Mercurial, is to list the line range flags separately so that the difference in how they are handled is more explicit:

[[fix.clang-format]]
command = ["/usr/bin/clang-format", "-", "--assume-filename=$path"]
linerange = "--lines=$first:$last"
patterns = ["**.cc", "**.h"]

Without any extra features to specify where the line range flags go, they would just be appended to the end of the command list:

/usr/bin/clang-format - \
--assume-filename=src/lib/foo.cc \
--lines=12:34 --lines=56:78

Including the other options alluded to in https://github.com/martinvonz/jj/issues/3801#issue-2328795810, a more complete configuration could look like this:

[[fix.clang-format]]
command = ["/usr/bin/clang-format", "-", "--sort-includes",
           "--lines=$first:$last", "--assume-filename=$path"]
patterns = ["**.cc", "**.h"]
priority = 123     # Run this tool after tools with lower priority numbers (on the same file)
skipclean = false  # Run this tool even if there are no line ranges, for --sort-includes
enabled = false    # Meant to be used if a tool needs to be disabled temporarily