martinvonz / jj

A Git-compatible VCS that is both simple and powerful
https://martinvonz.github.io/jj/
Apache License 2.0
7.51k stars 252 forks source link

FR: Support passing changed line ranges to tools from `jj fix` #3802

Open hooper opened 1 month ago

hooper commented 1 month ago

Problem:

Some users find that reformatting entire files introduces unwanted changes to their commits, due to formatter behavior changing over time, or just because they're working on code that wasn't previously auto-formatted. The jj fix command is likely to be objectionable to those users if it always formats the entire file.

Solution:

In much the same way as we compute diffs to determine the files to be fixed at each revision, we should compute diffs to determine the lines to be fixed in those files at each revision.

Alternatives:

You could imagine inserting an ancestor commit that re-formats the entire file, so that the descendants are unlikely to show any formatting-only diffs. That might actually be more objectionable, especially for users who don't want to use stacked change workflows.

You could imagine that the version control system would let the tool reformat an entire file, then try to reject diff chunks that seem unrelated to the previously existing changes. This runs a high risk of making semantic changes to code, which I think is unacceptable. Avoiding that would either require new interfaces between the VCS and the code formatter, or require that the VCS implement language-aware features that probably belong in the formatter anyway (all of which is redundant with the already existing --lines feature of many formatters).

Context:

Many code formatters (e.g. clang-format, black, rustfmt) support repeatable flags like --lines=123-456 that mostly limit the formatting changes to those areas of the file. The formatting changes often expand beyond the specified ranges due to indentation changes for surrounding syntax elements.

Some code formatters (e.g. swift-format, gofmt, buildifier) don't support line ranges, so it needs to be optional in the jj fix configuration. Users of those formatters probably care less about this FR.

hg fix computes line ranges by diffing each revision against a set of ancestors. It's also possible to do those diffs only at the roots of the revset being fixed, and propagate that information forward by updating it at each revision based on the changes made by the tools for the previous revisions. One of these approaches is also necessary to support passing line ranges to multiple tools affecting the same file at the same revision. So, it's probably best to use the same approach for both aspects. Diffing against ancestors is more precise, while propagating line ranges forward based on formatter diffs removes the need to know anything about those ancestors.

martinvonz commented 1 month ago

We could possibly add support for formatting only changed ranges without passing the line ranges to the formatting tool. We could instead format the whole file before and after the commit and then discard hunks that don't overlap with the existing hunk. That way it would not require support from each formatter.

arxanas commented 1 month ago

We could instead format the whole file before and after the commit and then discard hunks that don't overlap with the existing hunk.

I guess it would have some similarities with a hunk-based implementation of an absorb command? 👀

martinvonz commented 1 month ago

We could instead format the whole file before and after the commit and then discard hunks that don't overlap with the existing hunk.

I guess it would have some similarities with a hunk-based implementation of an absorb command? 👀

Yes, probably. I was thinking that we'd use the multi-way diff support we have here: https://github.com/martinvonz/jj/blob/65a988e3d2d6b3ed201ed8c2b4952ef08344e55f/lib/src/diff.rs#L338-L341

hooper commented 1 month ago

I tried to describe that in the "alternatives" above. I don't think it's practical, but I would love to see any evidence to the contrary. I think the main pitfall is that formatters provide no guarantee that applying a subset of formatting hunks will preserve the semantics of the file as a user of jj fix would expect. Keeping the API between jj and formatters simple and ubiquitous is key to the success of the feature, and I think that using the formatters' existing line range flags is appropriate in that sense.

My experience is that configuring line range flags for each formatter is a minor issue, and the formatters that don't natively support it don't seem to generate much user feedback about it. Automating it would definitely help if jj fix were to be used in a more ad-hoc way, which makes me think maybe it is more of a generic jj run kind of feature that could also be useful with tools that have broader context than the content of one file.

A mostly separate thing I want to make a note of is that it will be important to make the diff algorithm pluggable. At Google, we have had issues with this feature in hg fix due to disagreements about which parts of a file are changed. For example, if a lint check blocks a workflow because it thinks line 10 is changed and needs to be reformatted, but the formatter instead thinks line 20 is changed, the user can get stuck. That can be addressed by making the lint check actionable in ways other than running jj fix, but users might run jj fix in that scenario anyway, expecting it to help.

There's a broader problem that formatting only changed lines raises new issues like the one above, but it's an important capability for some teams, and it's not something that we have to force upon all jj fix users. If we did implement magic generic support inside jj, we would want it to be opt-in (a config bool that defaults to off).

martinvonz commented 1 month ago

I think the main pitfall is that formatters provide no guarantee that applying a subset of formatting hunks will preserve the semantics of the file as a user of jj fix would expect.

Good point. I'm having trouble thinking of a case where it would be a problem. I'm guessing Python is most problematic (among common languages). Can you think of an example?