FR: Support `jj fix -r X` to fix `X` with minimal effects on `X..`

hooper commented 5 months ago

Problem:

Some users find that hg fix -s X formats more files/lines than they wanted to be formatted in X...

Solution:

If the user specifies -r X instead of -s X, don't include the diffs of X.. in the files/lines to be fixed. The details will depend a lot on how we implement line ranges. As for the set of files to be fixed, it basically means we would skip the parent_tree.diff_stream() section of jj fix when picking ToolInputs for X...

Alternatives:

We fix files in X.., instead of rebasing them, to avoid merge conflicts between the user's changes and the code formatter's changes. I consider rebasing to be a worse alternative, because those conflicts are both confusing to deal with (especially if a diff tool ignores whitespace changes) and not very useful. Rebasing is potentially faster than some code formatters, but not enough to justify exposing users to unnecessary conflicts.

Context:

Anecdotally, Google has had many users of hg fix over many years, and they don't ask for the rebasing behavior. They do occasionally ask for something like this FR.

arxanas commented 5 months ago

We fix files in X.., instead of rebasing them,

Anecdotally, Google has had many users of hg fix over many years, and they don't ask for the rebasing behavior. They do occasionally ask for something like this FR.

I don't really understand the FR as written. Is it not essentially proposing rebasing the descendant commits instead of fixing them? Can you provide a specification of the behavior not in terms of the current implementation?

Rebasing is potentially faster than some code formatters, but not enough to justify exposing users to unnecessary conflicts

Rebasing instead of fixing or reparenting descendants becomes important/useful if the fixes are expensive to compute (just opened https://github.com/martinvonz/jj/issues/3808 in relation). It's annoying when I have to wait for cargo clippy --fix to fix the same lint issues in all of the commits repeatedly.

matts1 commented 3 months ago

IMO, We should always reparent instead of rebasing (to avoid conflicts), but we should do so while solving the problem that @arxanas mentioned.

def fix(revisions):
    fixes_required = {rev: changed_files(rev) for rev in revisions}

    for rev in revisions:
        new_files = rev.files
        for file in changed_files(rev):
            if file in fixes_required:
                file.set_content(run_fix_command(file.path))
            else:
                # There was originally no change, but the fix command changed our parent
                file.set_content(rev.parent.files[file.path])
        reparent_children(rev)

Suppose that commit A changed files foo and bar, and commit B (a child of A) changed file foo, and commit C (a child of A) is empty. If the user runs jj fix -r 'mutable()::B', and their fix command was fix, then we should:

Run fix on foo and bar in A, producing A'
Reparent B and C onto A'
Fix B
- Determine that foo and bar have been changed, but that only foo is in fixes_required
- Run fix on foo
- jj restore --from A' --to B bar because it isn't in fixes_required
C is now non-empty, containing the un-fixed foo and bar.

arxanas commented 3 months ago

[...] but we should do so while solving the problem that @arxanas mentioned. [...]

Determine that foo and bar have been changed, but that only foo is in fixes_required

Run fix on foo

Unfortunately, this approach doesn't solve the problem I have. It supposes that a fix can be run on individual files, but (as far as I can tell) cargo clippy --fix can't run on individual files. Build tools are particularly likely to be 1) expensive and 2) unable to run on individual files. See also https://github.com/martinvonz/jj/issues/3808.

The file-level-granularity "redoing" approach you suggest makes sense for many other classes of formatter-style fix. It's also similar to an approach I recall at a previous company where people did automatic codebase-wide refactorings and opted to drop changes in conflicting files altogether, although the redoing would be done manually at a later time.

joyously commented 3 months ago

Why is there a fix command? It seems beyond the scope of a version control system.

arxanas commented 3 months ago

@joyously It needs to operate on and modify the commit graph. How could it exist without the version control system? It might be best to start a separate discussion regarding the suitability/scope of the fix command, if you have something in mind.

joyously commented 3 months ago

This discussion seems to be about code formatting. That would be a normal change in files, not anything to do with the commit graph.

emilazy commented 3 months ago

It’s for applying tools like formatters to a range of commits in batch.

matts1 commented 3 months ago

Unfortunately, this approach doesn't solve the problem I have. It supposes that a fix can be run on individual files, but (as far as I can tell) cargo clippy --fix can't run on individual files. Build tools are particularly likely to be 1) expensive and 2) unable to run on individual files. See also #3808.

The file-level-granularity "redoing" approach you suggest makes sense for many other classes of formatter-style fix. It's also similar to an approach I recall at a previous company where people did automatic codebase-wide refactorings and opted to drop changes in conflicting files altogether, although the redoing would be done manually at a later time.

By design, fix is not suitable for all use cases - like you said, fix is designed to run formatters which take only a single file as input. Your use case is reasonable, but it should be solved by jj run, so I don't see a problem with it not being solved by jj fix.

joyously commented 3 months ago

I still don't see why a VCS would have a command that is for affecting the content in batch when that changes history. If it's for doing a batch chmod or resetting the commit date or author or branch(?), that sort of makes sense, but not for doing what is basically a file operation on commits (although I know I mentioned chmod...).

matts1 commented 3 months ago

@joyously I'm not sure what the problem is with changing history, though maybe I'm misunderstanding your point. Literally everything injj changes history, by design, and I don't see any issues with changing history if you're only changing mutable commits (which is the only thing you can do by default).

jj squash changes history for all children of the commit you amend
jj squash my_file --to <commit id> changes history of some arbitrary commit and its descendants

My use case is:

I write commit A, then run jj commit
I write commit B on top of commit A, then run jj commit
I want to upload the commit stack, so I run jj fix -r mutable()..B, then jj git push --change B

If I were to not use jj fix, I'd instead do:

jj edit A
<list files in A> | xargs clang-format
jj edit B -> possibly resolve merge conflicts since, unlike jj fix, this rebases instead of reparents
<list files in B> | xargs clang-format
jj git push --change B

It's a massive pain in the ass, and still ends up rewriting history, so I'm confused about your opposition to jj fix rewriting history. Mind you, I really don't like the way it's currently implemented, with a -s flag (I'd much prefer -r to specify the set of revisions you want precisely). Maybe that'd appease your concerns somewhat, given that the history rewrite is limited to the revisions that you specify precisely.

Also, how do you feel about jj run then (#1869)? That can run more complex commands, and seems extremely useful to me for updating goldens, for example.

joyously commented 3 months ago

@matts1 Your use case can be solved other ways: simply run your formatter and squash that into whatever revision you want (or leave as a separate commit with its own message since it is a separate change).

I guess my opposition comes from a viewpoint of the VCS being invoked only when the code is ready to share, and that the historical states are those already shared.

I just don't see why a VCS has anything to do with batch processing of file contents.

arxanas commented 3 months ago

Your use case is reasonable, but it should be solved by jj run, so I don't see a problem with it not being solved by jj fix.

I contest the design and argue that jj fix should also handle it as per https://github.com/martinvonz/jj/issues/3808 (and specifically not jj run). That being said, I am generally favorable towards some kind of solution that offers redoing/throwing away on a file-level granularity even for module-level fixes because I've seen it work well in practice, so there might not be too much conflict regardless of whether we make jj fix support working-copy-level fixes.

Your use case can be solved other ways: simply run your formatter and squash that into whatever revision you want [...]

One would hope so, but I've found that it doesn't work that well in practice due to meaningless conflicts:

You can't just format the top of the stack and try to squash changes elsewhere, because some of the formatting changes may apply to the contents of the top commit, and previous commits would remain unformatted in certain places.
- There's an implicit assumption that you have multiple commits in your stack that you want to format and send for review individually.
If you try to format each commit individually, then you induce unnecessary conflicts in descendant commits.
- Even though jj handles conflicts much better, there's a lot of semantically-uninteresting conflicts that would still have to be resolved, especially since it doesn't word-granularity/whitespace-insensitive diffing.
- This issue is about strategies for avoiding conflicts in descendant commits.
There's a related discussion here: https://github.com/arxanas/git-branchless/wiki/Command:-git-test#preventing-merge-conflicts.

I guess my opposition comes from a viewpoint of the VCS being invoked only when the code is ready to share [...] I just don't see why a VCS has anything to do with batch processing of file contents.

Generally speaking, there are two cohorts of VCS user when it comes to this kind of thing:

one who uses VCS extensively as part of the development process, especially while code is still being prepared;
one who uses VCS primarily for sharing work with others, attribution, etc.

jj run/jj fix are mainly useful to the first cohort. Both of these commands are based on commands from other VCSes, so many think that both are generally useful — but it also means that the design hasn't been widely discussed and motivated, so it's often not obvious to the unfamiliar user what the advantages might be.

@PhilipMetzger wrote an initial design document for jj run that could help motivate it.
The git-branchless test docs detail one interface, which isn't directly motivating, but could shed light on potential use-cases.

@joyously I'm happy to discuss more but I think we should move to a separate discussion of the general suitability of jj fix/jj run command for a VCS, if you want to start one. (I am personally a big proponent, having implemented the variant for git-branchless 😁)

PhilipMetzger commented 3 months ago

Your use case is reasonable, but it should be solved by jj run, so I don't see a problem with it not being solved by jj fix.

I contest the design and argue that jj fix should also handle it as per #3808 (and specifically not jj run).

I did not follow this discussion closely but can you explain what the semantic difference here should be? I heavily agree with @matts1 that this functionality is covered by jj run.

matts1 commented 3 months ago

@joyously My take is that jj should be fully supportive of rewriting history as much as you want. There's nothing wrong with your approach, but there's also nothing inherently wrong with rewriting history.

The reason there's nothing wrong with rewriting history is that if rewriting history for a commit was truly bad, then it should go into your immutable heads. For example, if you, for whatever reason, didn't want to support amending a commit that had been sent to github for review, just put remote_branches(remote="github") in your immutable heads.

FWIW, I think this may also be related to the code review system people use. Your principles you mention are probably a very good set of principles for working with a code review system where you review PRs, and especially so if they don't allow rewriting history. However, rewriting history is the recommended approach for both gerrit and piper, because both of these systems have you review individual commits, where each commit has a stable change-id. So I can send out commits A and B, get a review requesting changes on A, jj edit A, then jj piper upload B to upload both the amended A and the auto-rebased B.

I do agree with @arxanas that there are definitely two classes of VCS users. During the development process, I'll be doing rebases, amends, octopus merges, squashes, editing intermediate commits, etc. After code review, I continue doing all that, rewriting history on existing commits already sent for review. I only stop rewriting history once a piece of code has been submitted.

I think the metrics we should be marking against are:

Is it useful (yes, we've shown several use cases where it's important)
Is it going to create problems?
- You've come up with a very principled argument to this, what I'd appreciate from you is a concrete "this is an example set of steps that will result in state x, which is bad because y".

joyously commented 3 months ago

Is it going to create problems?

I think what bothers me about a fix command is

It sounds like "fixing a bug" in that some tools have keywords to close the corresponding bug ticket.
Running an arbitrary tool in batch doesn't update the descriptions (or does it?)
It is not clear that there are any boundaries such as immutable() or mine(). How would you know if you affected something you didn't intend to, when you enter the wrong revset?
Is the whole file affected, or is it filtered to only the lines that are in the diff?
At what point is the snapshot taken, or are there more than one? Is this compatible with undo and a concurrent usage?
Aren't there already commands to propagate edits to descendants?

ilyagr commented 3 months ago

I haven't followed the whole discussion, but I wonder whether this could be addressed by some version of --verbatim-rebase from #1027, similarly to jj amend --verbatim-rebase.

Conceptually (and perhaps for real), I wonder if the version of jj fix that creates a child commit for each commit it processes should be considered as the "most basic" version of the command. Then, the question of how and whether to squash those commits into parents becomes a separate question.

matts1 commented 3 months ago

Is it going to create problems?

I think what bothers me about a fix command is

It sounds like "fixing a bug" in that some tools have keywords to close the corresponding bug ticket.

I don't have any particular qualms with the name, but I wouldn't be opposed to renaming it to something like jj format.

Running an arbitrary tool in batch doesn't update the descriptions (or does it?)

It does not (and it shouldn't, considering the use case). This isn't for arbitrary tools, but specifically for formatters.

It is not clear that there are any boundaries such as immutable() or mine(). How would you know if you affected something you didn't intend to, when you enter the wrong revset?

I don't see how this is any different from any other command that affects multiple revisions (eg. rebase). And the boundary on immutable is simply that jj won't let you edit an immutable commit. That being said, re "how would you know if you affected something you didn't intend to", the only time it should ever be a problem to change history by running a formatter on a change and changing history is if that commit should not be changed at all, and those commits should just be marked as immutable. If I accidentally ran a formatter on a commit I didn't intend to, my code looks prettier but still runs the same. As proof of my earlier statement:

$ jj fix -s main
Error: Commit 63b808e40000 is immutable
Hint: Pass `--ignore-immutable` or configure the set of immutable commits via `revset-aliases.immutable_heads()`.

Is the whole file affected, or is it filtered to only the lines that are in the diff?

The whole file - formatters generally work on a file level. I don't think there's anything wrong with this. If I ran clang-format, it's not going to run on only the changed part of the file. This command should be no different from running it manually.

At what point is the snapshot taken, or are there more than one? Is this compatible with undo and a concurrent usage?

I'm no expert on this, but I'd imagine once before the command is run, and once after the command completes. I don't see how it's any different from any other command that rewrites history (eg. jj squash or jj rebase) and a concurrent usage though?

Aren't there already commands to propagate edits to descendants?

This is an extremely common use case, and to the best of my knowledge, there's no easy way to achieve it.

Conceptually (and perhaps for real), I wonder if the version of jj fix that creates a child commit for each commit it processes should be considered as the "most basic" version of the command. Then, the question of how and whether to squash those commits into parents becomes a separate question.

I strongly disagree with this one. While I do like the exisence of --verbatim-rebase, I think it doesn't make sense here:

It creates a very complex history
- It turns 3 linear commits into 6 nonlinear commits, for example
I cannot think of anything that a user could possibly do with it other than just immediately squash them in to create the linear history that jj currently creates
- It doesn't appear to solve a problem
Formatters inherently are not something that a human reviews and says "are the changes good". You just blindly apply a formatter and don't need to verify them.
You say the version of jj fix that creates a child for each commit it processes, but that currently doesn't even exist, and to the best of my knowledge, there have been no requests for that feature. Doesn't that mean that it's not useful?

ilyagr commented 3 months ago

I strongly disagree with this one. While I do like the exisence of --verbatim-rebase, I think it doesn't make sense here:

I am not suggesting this should be the default behavior. We can also leave it at "conceptually" and not do it in practice at all.

My main point is that we can think of this command as doing two relatively orthogonal things. We can think separately about the options for how to "fix" the commit, and for options of how to then squash or integrate the new patch into the commit graph.

This also suggests to me that solving this in parallel to squash --verbatim-rebase (or whatever we decide to call it) might be reasonable.

I cannot think of anything that a user could possibly do with it other than just immediately squash them in to create the linear history that jj currently creates

I can think of a few possible things: squash normally, squash with --verbatim-rebase (this is effectively what has to happen if the child commit is also fixed), rebase the other descendants on top of the new commit, or rebase the other descendants on top of the new commit with --verbatim.

It bothers me a little (unrelated to your comment) that whether or not using --verbatim is natural seems to depend a lot on whether the descendants are also being fixed or not.

Formatters inherently are not something that a human reviews and says "are the changes good". You just blindly apply a formatter and don't need to verify them.

This is true, I'm perhaps polluting this discussion by thinking about jj run, e.g. using LSP to rename a variable as the "fix". TBH, I'm not sure where the boundary lies and which command that usecase belongs to.

matts1 commented 3 months ago

I can think of a few possible things: squash normally, squash with --verbatim-rebase (this is effectively what has to happen if the child commit is also fixed), rebase the other descendants on top of the new commit, or rebase the other descendants on top of the new commit with --verbatim.

I do agree with two of those three use cases. I don't currently like that only the first option is currently supported - I personally like the idea of a -r option (which is what this issue was originally about), because it would allow your third option to happen. I can't see a world where option 2 is useful, but if someone did come up with a reason I wouldn't be opposed to it.

With all that said, I'd much prefer jj fix -r or jj fix --regular-rebase over jj fix --new, then jj squash manually. I'd prefer for the complexity to be in the command implementation, rather than what the user has to run.

ilyagr commented 3 months ago

I personally like the idea of a -r option (which is what this issue was originally about), because it would allow your third option to happen. I can't see a world where option 2 is useful, but if someone did come up with a reason I wouldn't be opposed to it.

I may have misunderstood what this issue is about. My first guess at what the -r option for fix meant would be to do option 2 for the children of the heads of the revset being fixed. So, we could call it jj fix X --verbatim-rebase or something along those lines (though I'm not sure about it, it just seemed like an option nobody mentioned). I didn't think my options 3 or 4 were being considered here. I'm now a little confused, but perhaps that's because I jumped in mid-conversation. Perhaps things will be clearer if I reread the thread more carefully.

With all that said, I'd much prefer jj fix -r or jj fix --regular-rebase over jj fix --new, then jj squash manually.

Yes, I never meant the user to be forced to squash things manually. If we implemented manual squashing at all, the users would have to opt into it.

matts1 commented 3 months ago

@hooper What was your intention here? I had assumed that we would never want to rebase when we run jj fix because that would create conflicts, and thus my assumption was that jj fix -r <revset> would mean "fix the revset, then reparent all children of that revset", but it appears that @ilyagr had thought otherwise.

ilyagr commented 3 months ago

Yes, I think this was my confusion.

I assumed that jj fix X would rebase descendants of X onto the new version of the commit, as every other command in jj would, and that the request of this issue would be to have an option to do a "verbatim rebase" (which a lot of people seem to call reparenting).

In reality (as I'm sure most people reading are well aware), jj fix X doesn't exist at the moment, there is only jj fix -s X.

It is true that "reparenting" behavior seems generally more useful for a jj fix command, so I can see why one might assume we'd do that by default.

I'm unsure what's best. I think reparenting by default might be reasonable, but it has the problem that it's confusingly different from how all other jj commands behave. I'd be happier with it if there was some way it was communicated to the user (via the name of the option, say), but I'm not insisting on this either.

joyously commented 3 months ago

It occurs to me that this sounds an awful lot like a pre-commit hook.

matts1 commented 3 months ago

It does have similarities, but pre-commit hooks don't make sense in jj (see my section on pre-commit hooks in https://github.com/martinvonz/jj/issues/3577#issuecomment-2087816381), since jj can't tell when you intend to commit. In that thread, we basically determined that since rewriting history is so easy in jj, all pre-commit hooks can easily just be done as pre-upload hooks instead, and there's no disadvantage to doing so.

So the workflow for most users will probably be:

Code commit stack containing A, B, and C
jj fix
jj lint? This might be done via jj fix or jj run, depending on whether the linter does things a single file at a time.
jj git push / jj piper upload / jj gerrit send

martinvonz / jj

FR: Support `jj fix -r X` to fix `X` with minimal effects on `X..` #3805