FR: Implement JJ `run` - Githubissues

PhilipMetzger commented 1 year ago

This is the tracking issue for run, mostly for myself.

The design doc is available here (Google Docs) and here (docs/design).

Related to #405

[x] Add a barebones run command, which is hidden.
[x] Move the design doc from Google Docs to docs/design.
[ ] Add the VFS integration point, for EdenFS and CitC.
[ ] Implement the worker interface for the Git backend.
[ ] Add a beautiful TUI with superconsole, based on @arxanas suggestion in #1486.

this should establish the base features of run.

And to add, here's a simple wishlist based on Discord conversations:

Add run --bisect, to support a git bisect -x like workflow based on git-branchless's git test.[^1]
Supporting bayesian based bisection [^2]
Partially exposing the underlying repo, for the default backend with something like --files [PATHS]

[^1]: My suggestion in Discord [^2]: A suggestion from @thoughtpolice in Discord

arxanas commented 1 year ago

The scm-bisect crate contains bisection logic already. For git run --bisect, it would be nice if we could centralize logic into scm-bisect so that other version control systems could potentially use it.

thoughtpolice commented 1 year ago

As an update to the entry for bayesian bisection support, this just came out of Google pretty recently: Flake Aware Culprit Finding. https://research.google/pubs/pub52048/ This is certainly a good overview of what such a tool would try to achieve and look like, and what assumptions it would make.

Algorithms 2 and 3 in Section 3 effectively summarize the whole algorithm, operationally.

PhilipMetzger commented 5 months ago

@matts1 Moving this conversation here, as this is the correct place for it.

Comments on jj run @PhilipMetzger, a few suggestions on your design doc for jj run (is there a good way to give feedback on design docs on > github - comments on a google doc work pretty well, but there doesn't seem to be anything like that for github):

If I understand correctly, exactly one of --dry-run, --rebase, --reparent, and --readonly must be chosen Probably worth calling out on there. It's very unclear, especially because --clean is stuck between them.

Just --reparent and --rebase are exclusive, as they operate on the repos history. --readonly only exists for specific workflows, while --clean is there for manual garbage collection until there's a better way. And the section is only there to provide an overview of all requested features.

Based on the fact that you say that you can write by default, and that it's parallel by default, I assume that the default is -- >reparent?

No, the default is to rebase the "new"/"modified" commits, as its jj's default. While you could argue that run should be minimally intrusive into your existing tree, it just isn't. I mean should act as if you've run <command> on all ancestors manually and it just is the obvious behavior.

You write that by default, jj run works on @. I'd personally like jj run to work by default on mutable()::@ (If I wanted to run jj run -r @ 'command', I'd instead just run > command directly). I think it'd be valuable to instead do the same thing we do for log, where we add revsets.edit (which could default to @)

Yeah, this totally makes sense, but as the Design Doc was written when neither trunk() nor immutable_heads() existed, it was the correct choice to just use @.

No mention was made about the working directory, so I'm curious how relative paths would work IMO, we should set the $JJ_REPO environment variable so that a relative path to the root of the repo can be $JJ_REPO/foo Maintaining relative paths has value A unix mount namespace could solve this by literally just overlaying our new virtual file system on top of the real one But unclear how this'd work on non-unix systems

This is another great point, which never was brought up. I agree that the $JJ_REPO env var should be included from the beginning.

matts1 commented 5 months ago

Could you clarify what --readonly does in more detail. The documentation says " ignore changes across multiple run invocations", and that seems like it could mean what I understand --readonly to mean, but it feels like a wierd way to phrase it. My understanding of --readonly would either be:

If the run command attempts to modify the tree, throw an error and don't update the commit with those changes
If the run command attempts to modify the tree, don't update the commit with those changes

If my understanding is correct, then surely readonly / reparent / rebase should be mutually exclusive - it doesn't make much sense to rebase something if it's readonly

--readonly only exists for specific workflows

This doesn't make much sense to me given my understanding of how --readonly would work. Could you clarify what you mean by this

No, the default is to rebase the "new"/"modified" commits, as its jj's default. While you could argue that run should be minimally intrusive into your existing tree, it just isn't. I mean should act as if you've run <command> on all ancestors manually and it just is the obvious behavior.

The reason I said that I assumed reparent would be the default is that IIRC (might be misremembering this) you said somewhere that by default it would run in parallel, and I thought that would be incompatible with rebase. I figured that that reparent and readonly can run edit in parallel on all specified revisions at once, because they have no dependency on the parent's post-run tree. On the other hand, rebase may have changed the tree by the time you got to a revision, so you can't do it in parallel, and it can create conflicts.

Maybe I'm misunderstanding how you're implementing rebase (maybe you run commands then do rebasing or something like that - if so, probably worth clarifying in the docs), but here was my understanding of the difference between rebase and reparent.

Suppose I have two changes A and B stacked on top of each other, with commit IDs a1 and b1 respectively.

Now I run jj run --rebase <command>: 1) <command> runs on a1, changing A to CommitID a2 2) B is rebased onto a2, changing B to CommitId b2 3) <command> runs on b2, changing B to CommitId b3

You cannot run steps 1 and 3 at the same time because step 3 is dependent on the tree b2, which is dependent on the rebase and thus on the tree a2, which requires running step 1 to determine.

On the other hand, jj run --reparent <command> would:

<command> runs on a1, changing A to CommitId a2
<command> runs on b1, changing B to CommitId b2

Note that this would also mean that if conflicts were generated with --rebase, jj run would either have to back out, or attempt to run on a conflict.

arxanas commented 5 months ago

@PhilipMetzger do you or someone else have immediate plans to implement jj run, and will the first implementation be exactly according to the scope set out in the design document?

It doesn't make sense for jj run to be making changes to the working copy/commit graph as part of its default mode of operation —

if it's a good change to make, then it's a "fix" and belongs in jj fix;
if it's a bad change to make, then it shouldn't be processed anyways.

An initial implementation of jj run should probably be restricted to running a command on each of a set of commits and collecting the output/exit codes and presenting that to the user somehow. Then jj fix, etc., can be modified to make use of the jj run infrastructure.

I think the relevant issue for rebasing/reparenting behavior is now https://github.com/martinvonz/jj/issues/3805.

No mention was made about the working directory, so I'm curious how relative paths would work IMO, we should set the $JJ_REPO environment variable so that a relative path to the root of the repo can be $JJ_REPO/foo Maintaining relative paths has value This is another great point, which never was brought up. I agree that the $JJ_REPO env var should be included from the beginning.

I don't disagree with the idea of preserving relative paths, but I don't think it needs to go into an initial implementation?

The reason I said that I assumed reparent would be the default is that IIRC (might be misremembering this) you said somewhere that by default it would run in parallel, and I thought that would be incompatible with rebase. I figured that that reparent and readonly can run edit in parallel on all specified revisions at once, because they have no dependency on the parent's post-run tree. On the other hand, rebase may have changed the tree by the time you got to a revision, so you can't do it in parallel, and it can create conflicts.

I don't know what specifically was being referred to, but there is a useful rebase approach that does have the potential to induce conflicts, similar to git rebase -x. It's most useful for commands that operate on the diff of the working copy with its parent, rather than the whole repository at once (for which reparenting might be more appropriate if the operation is "coherent"" in a mathematical sense).

PhilipMetzger commented 5 months ago

I still plan to implement run to answer that question, but it will be clearly best effort not immediately matching the design doc, even with all the amendments required. The goal is to get incrementally there and probably adding some required features which get requested along the way. And since fix now exists, this hopefully will be easier.

PhilipMetzger commented 5 months ago

The VFS can arrive at a later point, as I definitely have no experience writing such a thing, as Googlers will have a their requirements and experience from already building one.

matts1 commented 5 months ago

if it's a good change to make, then it's a "fix" and belongs in jj fix

This isn't necessarily true. IIUC, jj fix is optimized to run a single tool on many different files. On the other hand, jj run could be used for something like REGENERATE_GOLDENS=1 <run tests> to regenerate golden testdata. Whether that's enough of a justification to make the default mode not readonly, that's another question.

arxanas commented 5 months ago

This isn't necessarily true. IIUC, jj fix is optimized to run a single tool on many different files. On the other hand, jj run could be used for something like REGENERATE_GOLDENS=1 to regenerate golden testdata. Whether that's enough of a justification to make the default mode not readonly, that's another question.

Ah, to be clear, I meant to describe the ideal UI: things which are logically "fixes", which includes both formatting changes and updating snapshot tests, should go into jj fix. I commented at https://github.com/martinvonz/jj/issues/3808#issuecomment-2170747160 to expand on why jj fix shouldn't be limited to only files.

I still plan to implement run to answer that question, but it will be clearly best effort not immediately matching the design doc, even with all the amendments required. The goal is to get incrementally there and probably adding some required features which get requested along the way. And since fix now exists, this hopefully will be easier.

What I'm suggesting is cutting scope rather than adding to it (i.e. don't make jj run able to modify commits to begin with), so it should make your job easier 😉.

dbarnett commented 2 months ago

Heya @PhilipMetzger @arxanas, think there's anything here or in related issues that could use help? Is #4021 still a good place to look for the latest?

I got some coding time on my hands and am still itching to get jj to support precommit hooks/checks.

PhilipMetzger commented 2 months ago

Yes. it is the latest. See also the discussions in these issues #3575 and #3577 for some more infos.

martinvonz / jj

FR: Implement JJ `run` #1869