martinvonz / jj

A Git-compatible VCS that is both simple and powerful
https://martinvonz.github.io/jj/
Apache License 2.0
8.97k stars 312 forks source link

FR: Generalized hook support #3577

Open matts1 opened 6 months ago

matts1 commented 6 months ago

Is your feature request related to a problem? Please describe. I was looking at #2845 and realized that it didn't meet my needs because it didn't support pre-upload hooks. I investigated further into hooks, and the closest I found was #405, which rather than generalized hooks, specifically is trying to solve pre-commit hooks.

Describe the solution you'd like I was thinking of treating hooks like a pub-sub system. jj would publish events, which would consist of some metadata. Your hooks would simply be subscriptions to some subset of the events, matching specific metadata. For example:

message CommitDescription {
   string commit = 1;
   string description = 2;
}

message HookEvent {
    string workspace = 1;
    oneof hook_type {
        CommitDescriptionHook commit_description = 2;
        PreUploadHook pre_upload = 3;
    }
    ...
}

We would then, similar to my #3575, run all hook binaries that have subscribed to that event in the global config.toml. Similar to #3575, these hook binaries would be passed a file descriptor with a connection to the jj grpc server. For example, my global config.toml might look like:

[hooks]
subscribe_to_every_event = { binary = "/path/to/foo" }
pre_upload = { binary = "/path/to/bar", type = "pre_upload" }
specific_workspace = { binary = "/path/to/baz", workspace = "/path/to/workspace" }

We can also implement per-repo hooks in the same way that git has them by allowing additional subscribers from a hooks/jj.toml or similar, though I'd hold off on this in the short term due to security concerns.:

For example, a pre-upload hook that forces your commit to have an associated bug might look like:

conn = grpc.connect(file_descriptor(os.environ["JJ_SERVER"]))
main = importlib.import("/path/to/foo_ext.py", "main")
event = conn.get_event()

if "BUG=" not in event.commit_description.description:
    bug = input()
    description = f"{event.commit_description.description}\nBUG={bug}"
    conn.set_description(commit_id=event.commit_description.commit_id, description=description)

Though similar to my suggestion in #3575, we could provide preludes for common languages, and simply make the hook:

def main(conn, event):
    if "BUG=" not in event.commit_description.description:
        bug = input()
        description = f"{event.commit_description.description}\nBUG={bug}"
        conn.set_description(commit_id=event.commit_description.commit_id, description=description)

We would thus provide a method in jj along the lines of:

fn publish_event(event: &HookEvent) -> anyhow::Result {
    Vec<&Hook> hooks = filter_hooks(event);
    // Just a regular grpc server, but get_event() will always return the hook object.
    GrpcServer server = hook_grpc_server(event);
    for (hook in hooks) {
        let return_code = hook.run(grpc_server);
        if (return_code != 0) {
          bail!("Hook failed");
        }
    }
    Ok(())
}

Describe alternatives you've considered Didn't really consider much else, this is the first idea that came to mind. I thought it was a nice starting point. Other ideas are welcome.

Additional context Add any other context or screenshots about the feature request here.

PhilipMetzger commented 6 months ago

cc @arxanas for his opinion on hooks, as we discussed them quite heavily for jj run (#405/#1869).

Im very much on board on providing client integrations and having a pub-sub like event system in the daemon but I don't want to have it in the cli. Having external languages and binaries would fix some re-occurring ideas in relation to the template language #3262 and this recent Discord discussion.

matts1 commented 6 months ago

Could you elaborate on what you mean by "don't want to have it in the CLI"?

My line of thought, as I said above, was that:

This would mean that when you run a CLI command such as jj describe, it would run a pre-describe hook. I can't see any way of doing this cleanly other than that when you run the jj command you get hooks, so I'm rather confused by your statement of "don't want to have it in the CLI", both in terms of what you mean and why you don't want it.

What was your vision of how hooks would work?

PhilipMetzger commented 6 months ago

My line of thought, as I said above, was that:

  • The hooks would be external binaries
  • The config file would specify the hooks
  • The pub/sub aspect of them would be done within jj-lib

That's all fine, if you can take the performance penalty and latency from the external binary.

Could you elaborate on what you mean by "don't want to have it in the CLI"?

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

What was your vision of how hooks would work?

I'd rather just offload the whole hook system to the daemon and it is responsible for everything, the CLI only emits events which you can react upon and nothing more.

matts1 commented 6 months ago

That's all fine, if you can take the performance penalty and latency from the external binary.

In order to be sufficiently generic, any hook is necessarily going to have to be an external binary. I'm not sure I get what you're saying. Otherwise, rather than a generic hook, I'd have a fixed set of hooks that come packaged with jj that I could use.

I'd rather just offload the whole hook system to the daemon and it is responsible for everything, the CLI only emits events which you can react upon and nothing more.

Let's use a more concrete example. Suppose I wanted to make a post-describe hook that validated that the commit had a bug associated with it in the commit description. What would the workflow look like? My imagination was: 1) jj describe 2) Type a commit message without a bug 3) Start up an API server 3) Hook runs, connecting to that API server 4) Commit message validated via the hook, fails validation 5) jj describe fails with an exit status

I think necessarily any time a hook runs, the CLI is going to have to wait for that hook to finish so it can check whether the hook succeeded or failed. That being said, step 3 could be skipped if we just have a permanent API server running in the daemon.

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

I think the workflow I'd imagined is impossible with what you're proposing. What user workflow do you imagine? Could you take me through what happens both from a user perspective and a code perspective with your idea.

martinvonz commented 6 months ago

One reason I've avoided hooks so far is that we sometimes want solutions that can validate the commits after the operation is done anyway. One example of that is when you rewrite commits on the server (e.g. via some GitHub UI).

To better understand the use cases, what does your pre-upload Gerrit hook do?

matts1 commented 6 months ago

One reason I've avoided hooks so far is that we sometimes want solutions that can validate the commits after the operation is done anyway

I think that both pre- and post- hooks are reasonable, and complementary. I don't think, however, that we can say that the existence of some way to do post-validation means that we shouldn't do pre-validation.

To better understand the use cases, what does your pre-upload Gerrit hook do?

TLDR: It first finds the binary for the pre-upload hook, then it runs that binary, passing in a list of git commit shas.

ChromeOS is a multi-repo setup. You can think of it like git submodules except we don't use submodules - we use a tool called repo instead. We have one repo called repohooks which contains all of the hooks.

When we run repo upload, it first runs ${CHROMEOS_ROOT}/src/repohooks/pre-upload.py. For example, I usually work in the repository ~/chromiumos/src/bazel, so I have a script that finds the root directory based on the existence of a .repo directory (similar to how jj does it for .jj), and runs pre-upload.py <commit1> <commit2> before doing a git push.

Pre-upload.py does various things such as:

Note that it does this on every commit in the stack, not just the top commit.

Unfortunately, just running repo upload isn't an option, because it works on the current branch, and doesn't work with detached head.

martinvonz commented 6 months ago

I think that both pre- and post- hooks are reasonable, and complementary. I don't think, however, that we can say that the existence of some way to do post-validation means that we shouldn't do pre-validation.

If we are going to have hooks, then I think upload is a good case where they would be useful, but it might be too late to do it after uploading (you might have already accidentally uploaded your password file then).

I think the hooks you've mentioned above only require readonly access to the data. That can mostly be done after committing the transaction instead. That's not true if we also want to support hooks that can rewrite content, descriptions, branches, etc. It might be good to compile a list of use cases for that too.

matts1 commented 6 months ago

Upload hooks

Pre-upload hooks

Not too much to say. Pre-upload hooks should be able to mutate the commits in the stack (eg. running formatters). I imagine that they would get access to the jj API, and could use that to run something roughly equivalent to jj run 'immutable_heads()..commit_to_be_pushed' clang-format "$(jj files @)", or do whatever else they want.

Post-upload hooks

I can't really think of a use case for this. Maybe you could have a hook to trigger some kind of github action, for example, but it seems like that should be set up on the server side? I'd love to hear other people's opinion on whether this was useful.

Describe hooks

I think these are relatively un-contentious.

Commit hooks

I've saved this for last because these are by far the hardest.

Pre-commit hooks are hard & confusing because every time you run any jj command, it technically does a commit. It seems to me that pre-commit hooks should run when the user has the intention to say "I'm happy with this piece of work, let's commit", but that's extremely difficult to detect, because:

I don't think post-commit hooks make a lot of sense. A post-commit hook can only really do validation, and you can just as easily run that validation in pre-commit.

Because of all the difficulties described above in detecting when a "commit" intention occurs, I'm honestly not a big fan of pre-commit hooks. Instead, I'd personally prefer pre-upload hooks. I can see some small justifications for why pre-commit is better than pre-upload, but I think that being able to manually run my pre-upload hooks would be the way to go, since it's just so hard to detect when you actually "commit" in jj.

Other Git hooks

joyously commented 6 months ago

I don't think post-commit hooks make a lot of sense. A post-commit hook can only really do validation

In the context of a GUI, would it make sense that post-commit could be a trigger to update the display?

PhilipMetzger commented 6 months ago

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

I think the workflow I'd imagined is impossible with what you're proposing. What user workflow do you imagine? Could you take me through what happens both from a user perspective and a code perspective with your idea.

Since transactions are cheap (atleast for the Git backend, don't know if that counts for Piper), we can just rewrite the actual objects with a new transaction instead of providing hooks, which would fail in the CLI. This would match the behavior of a hook and since all of this happens in the daemon, it is something you can easily opt out.

Describe hooks

I think these are relatively un-contentious.

  • A pre-describe hook would pre-fill the commit message

    • A hook failure would be a warning, and would leave the message unchanged
  • A post-describe message would both validate the commit message and perform potential fixups on it

    • Failure of this would cause the editor to rerun
fn describe(commit: Commit) ->  {
  let mut msg = match pre_describe_hook(commit) {
    Ok(m) => m,
    Err(e) => {
      // If the initial description was invalid, we still have to be able to edit that description, so it doesn't really make sense to have these fail. So any pre-describe hook failure is considered an internal error to the hook.
      warning(e);
      msg
    }
  };
  while (true) {
    msg = run_editor(prefilled);
    match post_describe_hook(Commit{msg=msg, ..commit}) {
      Ok(edited) => return edited,
      // Rerun the editor
      Err(e) =>  print(e),
    }
  }
}

I still don't think that such a thing belongs into the CLI. I implore you to go through to the jj run Design Doc with all it's versions (the one in the repo and the Google Doc one) and the related Discord discussion, as it contains valuable conversations in relation to hooks.

I hope I made my idea clear, that keeping hooks out of band of the CLI is long term better.

And RE: Pre/Postsubmit hooks, we'll first need some definition of forge in library for it to be useful.

matts1 commented 6 months ago

Note for anyone reading this thread in the future: https://github.com/martinvonz/jj/issues/1869 is the tracking bug of jj run

we can just rewrite the actual objects with a new transaction instead of providing hooks

I'm not neccesarily opposed to this.

I still don't think that such a thing belongs into the CLI

To me, the most important thing is the user experience, and implementation details are secondary. If we can have a great user experience without putting it in the CLI, I'm certainly not opposed to that, but it's really hard to evaluate whether jj run is a good alternative to hooks in the CLI (even after having read the document you mentioned), because you haven't yet proposed a specific user journey.

Suppose a repo foo has a rule that the first line of a commit message must not exceed 80 characters, and the user wants to validate that when they write a description, it adheres to that specification.

The requirements I would like to impose upon a user journey are:

Based on reading the things you asked me to read, I'm not 100% certain, but it seems that you disagree with the user journey itself, and disagree with the fundamental need for a explicit implementation of a "hook". So my questions for you are:

From what I can tell from piecing together bits and pieces from what you're saying about doing this out of band, with a pub/sub message, and reading between the lines a lot, what would happen under the hood would look like this: 1) jj describe -> write commit message -> submit it 1) jj-cli publishes an event saying that a commit description has been updated 1) (this could happen anytime from now to the very end) jj-cli returns, having succeeded 1) jj daemon subscribes to that event and reads hooks/jj-config.toml (or some config file), and now knows that we have a subscriber to the post-describe hook that will run binary jj run hooks/post-describe 1) jj daemon runs 1) jj daemon runs the subscriber via something roughly equivalent to jj run hooks/post-describe (or something similar) 1) The post-describe binary runs, and the commit message fails validation

I hope you understand why I'm confused, because in the final step there, there is no way to report the failed validation to the end-user. I'm sure your idea works, but unfortunately I don't understand what you're trying to say here, so I'd appreciate it if you could elaborate on precisely what you want. Because you've communicated vague ideas rather than a precise set of steps, it's hard for me to tell what you're trying to say.

matts1 commented 5 months ago

I've been considering this further, and also chatted to @mlcui-corp offline.

My conclusions are:

I think what we should do is:

I also think it's important to distinguish hooks from some sort of pub-sub system for messages. @joyously raises a good point that it's useful to have a post-commit hook so that the IDE can know when to update. However, the differences between hooks and a pub-sub system (from my perspective, at least) are:

I think that an IDE would be better off subscribing to a pub-sub system, so that it simply gets notified when events occur. This would mean that the user doesn't have to configure the hook for the IDE themselves, and it also means we can provide events for things that really shouldn't need to be a hook (eg. the user runs jj branch create foo - clearly, we don't need any type of hook for that, but the IDE should ideally be able to get notified).

matts1 commented 5 months ago

@mlcui-corp also mentioned to me that pre-describe hooks could be made more difficult by jj describe -m foo, or anything where you specify the description in the command-line.

I don't think this changes my position on the matter - I think pre-upload hooks only make a good start.

joyously commented 5 months ago

we can just as easily make it a pre-upload hook.

Does this refer to jj git push or is there another command that falls under this label? Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

I spent a few years in the internals of WordPress, which uses a hook system in PHP (synchronous). It is amazingly flexible. One of the parameters is a priority, so that the code is run in priority order. There are two types of hook: one is a "filter" which can modify the first parameter (and must return it, so that filters can be chained), and the other is an "action" which can do whatever but no return value (more like pub-sub). Some big plugins also invoke the hook functions for their own hooks.

matts1 commented 5 months ago

Currently, pre-upload hooks refer only to jj git push in the public build. However, jj gerrit send (#2845) is currently in progress, and would presumably also use this same mechanism.

My intention is that any jj command which supports uploading would end up supporting this. This wouldn't be limited to the git backend - for example, internally at google we would be able to use this for a piper backend.

Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

IIUC, what you're trying to ask here is "can we make sure that pre-upload hooks are able to be run without actually pushing anywhere?". My intention was something similar to jj git push --dry-run except it also runs hooks (we could allow dry-run to run pre-upload hooks, but having --dry-run potentially modify commits may go against what users expect).

Alternatively, we could create a specific command to run your pre-upload hooks.

Your idea about the two types of hooks seems interesting, but I'm very hesitant to implement something like that. A pub-sub event has the rather nice property that it is guaranteed not to mutate the repository, so I'm happy to sprinkle them all over the place wherever someone might want to know about something. However, a regular hook can make no such guarantee. Allowing a user, for example, to make a hook when a branch is created means that we can no longer guarantee that a branch creation won't modify commits. I'm no expert on this, but I imagine that loosening these constraints could be a potential problem.

joyously commented 5 months ago

what you're trying to ask here is "can we make sure that pre-upload hooks are able to be run without actually pushing anywhere?".

Actually, I was thinking more in terms of needing hooks in other places besides pre-upload since those scenarios I mentioned wouldn't be uploading. At the same time, if the hooks are on push, they would run even when I push from one folder to another on my computer?

I agree that --dry-run should not modify anything. (Would it make a snapshot if needed?)

matts1 commented 5 months ago

Ah, I get you now, thanks for clarifying.

Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

For contributors stuck offline for a while, I imagine jj git push --dry-run --run-hooks would work. For someone working on an offline project, the same command could work, but a command like jj run-hooks pre-upload might be more appropriate (and would also solve the offline use case).

I'm thinking of it less as a pre-upload hook, and more of a pre-publish hook (where you usually publish via upload, but users could choose to do whatever they want.

matts1 commented 5 months ago

Also, @PhilipMetzger, I'd like to hear your thoughts on my updated plans, since you seemed to have issue with hooks before.

I understand that you don't want hooks in the CLI, and after rethinking, I'm ok with that for most use cases - I'm no longer advocating for a describe hook. However, I strongly believe that we need pre-upload hooks.

For example, consider my workflow when I contribute to jj. 1) Write code until cargo test passes 1) On each commit, run cargo fmt && cargo clippy && cargo test 1) jj git push

With jj run, this can be reduced to: 1) Write code until cargo test passes 1) Run jj run -r 'mutable()::@' 'cargo fmt && cargo clippy && cargo test (or something along these lines) 1) jj git push

This is clearly superior, but IMO is still insufficient. If I do it this way, there are so many ways to upload an invalid commit:

I see the purpose of a pre-upload hook as being complementary to jj run, rather than redundant.

I'd like to combine these. For example, I might create a file hooks/pre-upload containing:

#!/bin/bash -eu

jj run -r 'mutable()::@' 'cargo fmt && cargo clippy && cargo test'

Then my workflow would be: 1) Write code until cargo test passes 1) jj git push 1) jj runs the hook on the specific commit that needs to be uploaded, which then invokes jj run to validate and fix the whole stack of changes 1) It then actually uploads it once the commit is validated.

Note that I should clarify that I don't really care about the specific implementation (the bash file above was an example). I only care that from the perspective of a user of jj, I can make sure that jj doesn't let me upload a bad commit. Right now I'm trying to get buy-in on the concept of a pre-upload hook, rather than any specific implementation of it.

matts1 commented 5 months ago

Here's a simple proposal for an example implementation. I think it's quite nice, but as I said before, I'm open to other ideas.

Step 1: Multi-layered config files

We start off by making config.toml have more layers. Currently, jj basically does jj --config_toml=$HOME/.config/jj/config.toml. I propose changing it to:

jj \
  --config_toml=<repo>/jj.toml \
  --config_toml=$HOME/.config/jj/config.toml \
  --config_toml=<repo>/.jj/config.toml

This would allow repositories to create their own configurations, but allow you to override them if you wanted to (actual filename up for debate).

Step 2: jj run aliases

Add an entry in cargo.toml config files which allows you to create jj run configurations. For example:

<repo>/jj.toml would contain:


[run-aliases]
# People might not expect pre-upload hooks modifying commits.
pre-upload = {"aliases": ["lint", "test"]}
fmt = {write: "true", "cmd": ["cargo fmt"]}
lint = {"cmd": ["cargo clippy"]}
test = {"cmd": ["cargo test"]}

It could then be merged with a user-specific config for that specific repo (eg. <repo>/.jj/config.toml)

[run-aliases]
# I like my pre-upload hooks modifying commits
pre-upload = {"aliases": ["fmt", "lint", "test"]}

Step 3: The hook

A user could then run jj run --alias fmt to run their formatter, for example.

When you run jj git push, jj would automatically run jj run -r <all revisions being uploaded> --alias pre-upload, if an alias named pre-upload was present.

martinvonz commented 5 months ago

Just a quick note that reading config from the repo has security problems. One solution to that would be to allowlist certain keys. I think Git and Mercurial don't allow reading any config from repos. I don't know if they did it that way for simplicity or if there are problems even with allowlisted keys that I'm missing.

matts1 commented 5 months ago

That's a good point.

For security reasons, git can't run hooks unless you manually copy them to the .git/hooks directory (docs). Could we consider something similar where we require a user to "install" the repo-specific config by copying it to the .jj directory, or are there additional security concerns that may not be relevant for git?

ilyagr commented 5 months ago

Just a quick note that reading config from the repo has security problems.

I was thinking of having some way of storing "suggested config" in the repos that the user would have to explicitly import. It would probably be restricted to some allowlisted keys as well.

The application I had in mind is to suggest a value for immutable_heads(), but suggested hooks make sense too.

The problem here is that the exact UI and minor decisions here have security implications and I don't have a mental model of simple conditions that are "sufficient" for security.

martinvonz commented 5 months ago

Actually, there was some "config-based hooks" effort in Git a while ago (a quick googling led to this). I don't know if that series landed (@nasamuffin surely knows), but it's a good idea to at least understand what that patch series was about regardless.

joyously commented 5 months ago

I only care that from the perspective of a user of jj, I can make sure that jj doesn't let me upload a bad commit.

I know the jj model is different, but I just wanted to toss this in. The Bazaar/Breezy command structure is a little different from Git in that you work in a branch, merge the main branch in (it changes working copy only), do whatever you want including testing and fixing conflicts, then commit the working copy. This fixes the problem Git has with merge commits and not being able to test before it's merged(committed). I'm not sure how this would fit in with jj committing everything all the time. What if the result of a failed hook was visible in the log?

nasamuffin commented 5 months ago

config-based hooks

No, it did not land, but not for lack of interest or viability - just for lack of developer time. Google has been carrying the rest of that series downstream for years; we haven't had any problems with it. I'd absolutely recommend that approach for jj over the historical Git approach.

PhilipMetzger commented 5 months ago

Sorry for the late response here, I took some time off after that particular conversation. @matts1 and I had a private conversation off-thread.

⚠️, Warning! Wall of text ahed.

Finishing the user story:

  • jj describe -> write commit message -> submit it
  • jj-cli publishes an event saying that a commit description has been updated
  • (this could happen anytime from now to the very end) jj-cli returns, having succeeded
  • jj daemon subscribes to that event and reads hooks/jj-config.toml (or some config file), and now knows that we have a subscriber to the post-describe hook that will run binary jj run hooks/post-describe
  • jj daemon runs
  • jj daemon runs the subscriber via something roughly equivalent to jj run hooks/post-describe (or something similar)
  • The post-describe binary runs, and the commit message fails validation

And then the daemon re-opens $EDITOR with something like a jj describe @- --add-footer 'BUG=' etc. The UX should feel similar how Git hooks work currently, with a way nicer rollback mechanic if needed. And since subscribers are not bound to a local machine, we can add/create infrastructure which implement those checks on remote workers (like a mini-Tricorder) or build it directly into the forge.

However, the differences between hooks and a pub-sub system (from my perspective, at least) are:

  • Hooks can fail (and a failure generally results in the failure of the underlying command)
  • Hooks are synchronous (since you need to know whether the hook succeeded or not)
  • Hooks are configured by the user (or the repository)

While I agree that this is definitely a "Hook" in the Git sense, my opinion on that is still unchanged that we can achieve the same UX with a pub/sub part in a daemon. And since jj's transactions are cheap, I see no need to fail a command when we can immediately rewrite it via daemon.

pre / post-commit hooks are both unneccesary (jj run mostly makes these redundant) and difficult (it's very difficult to determine a commit intent in jj).

  • I'm no longer 100% on board with post-describe hooks. I use jj describe to write temporary names for things, similar to a git stash, and having those have to follow the conventions required for a commit that's going to be submitted seems like overkill, when we can just as easily make it a pre-upload hook.
  • I think there's a definite use case for using pre-describe hooks to pre-fill the commit description. However, the value proposition for it seems limited. Maybe setting the JJ_CHANGE_ID environment variable when running $EDITOR would allow users to configure their editor to do something similar (eg. inherit bugs from parent commits, add "Change-Id" to footer). My main concern here is that different repositories may have different requirements (eg. one that pushes to gerrit may require change-id in the footer, different remotes may have different formats for specifying bugs).

I agree on the conclusion but I think we can achieve the same feel with even with the daemon, as pointed out above by finishing the user story.

Addressing pre-upload hooks aka (g4 presubmit/arc lint)

I think what we should do is:

  • Only start by implementing pre-upload hooks

    • But do so by creating a generic mechanism for hooks, so we can easily add them later elsewhere if required

This is fine standalone, although I'd consider the pre-upload part another FR. 👍

Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

IIUC, what you're trying to ask here is "can we make sure that pre-upload hooks are able to be run without actually pushing anywhere?". My intention was something similar to jj git push --dry-run except it also runs hooks (we could allow dry-run to run pre-upload hooks, but having --dry-run potentially modify commits may go against what users expect).

Alternatively, we could create a specific command to run your pre-upload hooks.

At this point we've reached the design of g4 presubmit or arc lint which was always fine to me, and advocated for via jj fix/hg fix from @martinvonz.

With jj run, this can be reduced to:

  1. Write code until cargo test passes
  2. Run jj run -r 'mutable()::@' 'cargo fmt && cargo clippy && cargo test (or something along these lines)
  3. jj git push

This is clearly superior, but IMO is still insufficient.

Thanks for recognizing jj run's value.

If I do it this way, there are so many ways to upload an invalid commit:

  • I sometimes forget to run jj run
  • I work on several different repositories

    • I need to both learn and remember precisely which jj run command to run for each repository
  • I either need to remember the precise jj run invocation (which I often get wrong) or I need to look through my bash history for it (which may have many other random jj run invocations in it)
  • Sometimes I run something and it passes the checks. I then make a change that I think won't affect things, so I don't re-run those commands, but then it turns out it has affected things (yes, I really am my own worst enemy)

I see the purpose of a pre-upload hook as being complementary to jj run, rather than redundant.

  • A pre-upload hook is designed to protect you from uploading bad commits
  • jj run allows you to easily run things on multiple commits at once

I'd like to combine these. For example, I might create a file hooks/pre-upload containing:

And as you allude to, this workflow makes sense with failing in the pre-upload/pre-push step. I don't see any issue with it. And if we have jj fix this concludes to "pre-upload for pr/12345" failed, please run 'jj fix --apply'.

I think all the related things which are proposed are fine as is with minor adjustments.

WDYT of also supporting protobuf as configuration possibility?

TL;DR

You have my buy-in for the reduced scope and moving it out of the CLI.

PhilipMetzger commented 5 months ago

Actually, there was some "config-based hooks" effort in Git a while ago (a quick googling led to this).

Looking at the provided technical docs, this feature really looks like the pub/sub system in this FR with the concept of "Events" and all. So if we can provide a better implementation on that front I'm happy.

And that is a really nice design, which sadly hasn't landed in Git yet 😦 .

matts1 commented 5 months ago

And then the daemon re-opens $EDITOR with something like a jj describe @- --add-footer 'BUG='

My point here was that by the point the validation had failed, the CLI command has already exited. If that's the case, then the daemon re-opening editor means that it can't simply re-open it in the same terminal. It would have to spin up a new terminal window containing a new instance of $EDITOR, which seems like a subpar user experience. This would work just fine if $EDITOR was a gui application, but if you use an editor such as vim, it doesn't seem great.

WDYT of also supporting protobuf as configuration possibility?

I quite like the idea I proposed where we just use the existing jj configuration file. This could mean that we could just alias jj fix to jj run --alias fix, and have the same for jj lint and jj test. What do you think about my proposal above? You never responded to it, so I don't know how you felt.

PhilipMetzger commented 5 months ago

My point here was that by the point the validation had failed, the CLI command has already exited. If that's the case, then the daemon re-opening editor means that it can't simply re-open it in the same terminal. It would have to spin up a new terminal window containing a new instance of $EDITOR, which seems like a subpar user experience.

Ack, that makes sense. I wouldn't consider it a major blocker though, as there surely is a way to make it friendlier.

This could mean that we could just alias jj fix to jj run --alias fix

jj fix will be a standalone command and that has been decided since run was designed.

What do you think about my proposal above? You never responded to it, so I don't know how you felt.

The proposal is fine, although I'd make some different choices.

A bunch of nitpicks:

There's also the question of virtual aliases, so the pre-upload you defined, as I'm not sure if it already is supported.

And my idea with the protobuf configuration would be for the forge anyway, as having a stable representation there is helpful.

[^1]: these would be just strings which will be passed as <command> to run, like the aliases are now.

matts1 commented 5 months ago

jj fix will be a standalone command and that has been decided since run was designed.

I couldn't find any issues for fix when I looked. Could you link to where this was discussed? I'm a little surprised, since jj fix seems like it's trivial to implement as an alias in your config.toml (eg. fix = ["run", "-r", "immutable_heads()..@", "--no-rebase", "cargo fmt"]' - the no-rebase is an option I would presume exists on jj run to have it not modify the tree of the children, only that commit exactly).

  • I don't think run should be responsible for its aliases, as it will purely duplicate existing work.
  • The aliases table should just expand to normal run invocations1.

My assumption was that with my virtual pre-upload step I defined, jj run would be able to come up with a dependency graph. Consider, for example:

[run-aliases]
# People might not expect pre-upload hooks modifying commits.
pre-upload = {"aliases": ["fix", "rebasing", "lint", "test"]}
fix = {write: "true", "cmd": "cargo fmt"}
lint = {"cmd": "cargo clippy"}
test = {"cmd": "cargo test"}

This seems difficult to do with a regular alias. But maybe it's not on the table for run at all, in which case I guess a regular alias would work just fine.

  • Also there is no need to specify that the command writes, as it is the default.

You've probably already considered this, but having write as the default may leave performance on the table, since if I run `jj run -r 'immutable_heads()..@' 'cargo lint', then since I know it's readonly, I should be able to run it in parallel on every revision at once.

  • Looking at the TOML spec, there is no good support for sets, so something better is needed.

Could you clarify what specifically you needed sets for. IMO, sets can be mostly replaced with lists, and if we really wanted a set type jj can validate that there were no duplicates.

There's also the question of virtual aliases, so the pre-upload you defined, as I'm not sure if it already is supported.

To the best of my knowledge, it's currently unsupported to have multiple jj commands in a single alias, but I do think it'd be helpful. I think something like the following would work relatively well:

[aliases]
lint = ["run", "cargo clippy"]
test = ["run", "cargo test"]
pre-upload = {
    "opts": {
        name: "revision",
        short: "r",
        default: "immutable_heads()..@"
    },
    "commands": [
        # Could potentially add metadata here to create a dependency graph and allow commands to run in parallel.
        ["lint", "--revision=$revision"],
        ["test", "--revision=$revision"],
    ]
}

And my idea with the protobuf configuration would be for the forge anyway, as having a stable representation there is helpful.

This seems like it's introducing yet another configuration file for the forge. Plenty of these exist already. It seems easier to just have the repo define the configuration in a version-controlled file (#3684), and have the forge just invoke jj aliases (eg. jj test -r @ to run the alias test = ["run", "cargo test"]). This would allow the forge to just use the pre-existing configuration files, and also allow the forge to essentially "test" the jj aliases.

PhilipMetzger commented 5 months ago

jj fix will be a standalone command and that has been decided since run was designed.

I couldn't find any issues for fix when I looked. Could you link to where this was discussed?

There never was a issue, as it was decided on here and in this Discord discussion (see #3647 for the actual difference to run).

I'm a little surprised, since jj fix seems like it's trivial to implement as an alias in your config.toml (eg. fix = ["run", "-r", "immutable_heads()..@", "--no-rebase", "cargo fmt"]' - the no-rebase is an option I would presume exists on jj run to have it not modify the tree of the children, only that commit exactly).

I get that you're surprised, but since fix is a specialization of run it should have its own implementation. And a --no-rebase flag would be another FR in #1869 butI think your use-case there is already covered by just passing --reparent.

My assumption was that with my virtual pre-upload step I defined, jj run would be able to come up with a dependency graph. Consider, for example:

[run-aliases]
# People might not expect pre-upload hooks modifying commits.
pre-upload = {"aliases": ["fix", "rebasing", "lint", "test"]}
fix = {write: "true", "cmd": "cargo fmt"}
lint = {"cmd": "cargo clippy"}
test = {"cmd": "cargo test"}
  • It gets told to run fix, rebasing, lint, then test
  • This means that there's an operation for fix, lint, and test, on every change in the revset.
  • It knows that the lint and test command can run at the same time, since they're read-only, but that running it on a given commit requires the fix step on that particular commit to be finished"

This seems difficult to do with a regular alias. But maybe it's not on the table for run at all, in which case I guess a regular alias would work just fine.

It never was on the table that run is able to model a dependency graph. The best I can give you is just expanding the aliases in order. Maybe it could be a future extension but IMO it shouldn't be run's feature.

  • Also there is no need to specify that the command writes, as it is the default.

You've probably already considered this, but having write as the default may leave performance on the table, since if I run `jj run -r 'immutable_heads()..@' 'cargo lint', then since I know it's readonly, I should be able to run it in parallel on every revision at once.

I'm happy to inform you that this is run is parallel by default. But it is an optimization to keep in mind.

  • Looking at the TOML spec, there is no good support for sets, so something better is needed.

Could you clarify what specifically you needed sets for. IMO, sets can be mostly replaced with lists, and if we really wanted a set type jj can validate that there were no duplicates.

I just understood your suggested syntax (looked like a python dict at first) as that, OTOH I don't think that TOML tables and arrays scale that well (cargo has a good reason to generate the lock file).

There's also the question of virtual aliases, so the pre-upload you defined, as I'm not sure if it already is supported.

To the best of my knowledge, it's currently unsupported to have multiple jj commands in a single alias, but I do think it'd be helpful.

👍

This seems like it's introducing yet another configuration file for the forge. Plenty of these exist already. It seems easier to just have the repo define the configuration in a version-controlled file (#3684), and have the forge just invoke jj aliases (eg. jj test -r @ to run the alias test = ["run", "cargo test"]). This would allow the forge to just use the pre-existing configuration files, and also allow the forge to essentially "test" the jj aliases.

Fair enough.

matts1 commented 5 months ago

After thinking about it, I see two possible nice solutions for pre-upload hooks. I quite like the second option:

Option 1: Look for the alias pre-upload

[aliases]
pre-upload = ["run", "-r", "@", "pre-commit run"]

When you run jj git push, it would: 1) See if aliases.pre-upload exists in the config's cargo.toml 2) If it exists, run that alias

However, doing so would mean that if we want to ensure that pre-upload hooks are kept in the daemon (and can thus run even if you invoke upload from a tool such as an IDE), we would need to put CLI parsing inside the daemon, which I'm not a huge fan of.

Option 2: Specific pre-upload config section

[pre-upload]
command = "pre-commit run"
# JJ would provide an extra revset here containing the set of revisions that we are planning on uploading.
revset = "mutable()::uploaded_changes()"
# One of rebase, reparent, and readonly
mode = "readonly"

Here, the pre-upload configuration would roughly equate to the command-line options for jj run (and would call whatever function jj run calls under the hood). This would have the advantage of no longer being an arbitrary command, and thus we can have nice TUI integration once it's ready, amongst other things (though we'd need to be careful to ensure that the TUI is only used in jj-cli).

Comments on jj run

@PhilipMetzger, a few suggestions on your design doc for jj run (is there a good way to give feedback on design docs on github - comments on a google doc work pretty well, but there doesn't seem to be anything like that for github):

arxanas commented 5 months ago

I have a great deal written up against pre-commit hooks (as @PhilipMetzger alluded to), but it sounds like pre-commit hooks are not being discussed anymore, so no point in expounding further.

It sounds like there's a great deal of complexity:

This is primarily to solve an interface problem: that the user can accidentally run some commands without satisfying some invariants. Suppose that we just allowed the user to override the built-in commands with their own aliases? It seems like it would be substantially easier, and more flexible anyways. This should still solve the per-repo configuration issue by providing a one-time setup script to run.

For the case of large organizations, it seems they successfully get their developers to use blessed wrapper tools for things like builds, tests, code review, etc., so I don't think the hook thing is as big of a problem. jj is also designed so that individuals or organizations can provide their own custom build by plugging into jj-lib (which is what Google does right now?), so that seems like a natural place to insert any organizational checks, without having to work out a general solution for all the hooks that might ever exist.

I think the jj run implementation details are largely orthogonal, so they should be moved to a different thread. It suffices to decide or not that jj will invoke a run alias as a "hook". You can also see the interface for git test for related discussion.

For what it's worth, I use git test as a primary way to validate commits before sending them for review. I considered making my own wrapper script or integrating it into git submit directly, but the unfortunate fact is that that there is not always one set of suitable checks to run on each commit for a given push:

It's become apparent that one of the primary reasons that I didn't used to run more tests is simply that it was inconvenient, and being able to test/fix in parallel, with caching, without conflicts, improves the situation anyways.

If there are real pushes that need to be prevented (leaking credentials, uploading massive files), then they need to be on the server-side, rather than the client-side anyways.

mlcui-corp commented 5 months ago

This is primarily to solve an interface problem: that the user can accidentally run some commands without satisfying some invariants. Suppose that we just allowed the user to override the built-in commands with their own aliases? It seems like it would be substantially easier, and more flexible anyways. This should still solve the per-repo configuration issue by providing a one-time setup script to run.

it seems [large organizations] successfully get their developers to use blessed wrapper tools for things like builds, tests, code review, etc., so I don't think the hook thing is as big of a problem

I agree. Outside of large organisations - like open source GitHub repos - most (from n=3) repos I've seen on tend to encourage running tests (cargo test, npm test, etc.) and linting/formatting manually before uploading. Even internally in google3, you need to run linting/formatting manually before uploading. These checks are often run in CI after upload anyway.

I haven't seen any repository which uses something like https://pre-commit.com, or a wrapper which does checks before uploading, other than large Google git repositories (Android / ChromiumOS with repo, Chromium with git cl + depot-tools). I believe Phabricator also has pre-upload lint checks via arc lint.

On the other hand, this might be a symptom of the tools we're used to. Maybe we're too used to the status quo of "uploading doesn't run checks" because existing tools make it hard to do so, and jj could make it a nicer experience? Does the existence of git wrapper tools like repo, git cl and arc show a deficiency of git, which could be addressed by jj?

My personal opinion: I don't mind aliases to do this for most users, as aliases are very transparent to users and allows for fine-grained customisation. My current Chromium workflow uses git test run -x "my_presubmit.sh" -v "main..@" && git push origin HEAD:refs/for/main for pushing commits - expanding it out like this allows myself to modify the exact behaviour I want (for example, maybe I want to push a specific commit without checking it out, or just run presubmits for some commits), similar to @arxanas's use cases.

I would expect most wrapper tools would have similar behaviour to <run presubmit on REVSET> && jj forge upload [REVSET] [UPLOAD_FLAGS] - if anything more bespoke is needed (like changing upload parameters based on presubmit results), a wrapper tool would make sense. Getting users to add an alias to their repo-specific config is probably the same amount of effort as getting users to add a preupload hook IMO.

matts1 commented 5 months ago

It sounds like there's a great deal of complexity:

I don't think this is particularly accurate. These were all being discussed as potential ways to solve the problem, but my most recent proposal was extremely simple. It required

I'd estimate that this could be written in ~100 lines of code without too much difficulty.

the unfortunate fact is that that there is not always one set of suitable checks to run on each commit for a given push

I think that's true, but I believe that there is almost always a set of things you want to run before uploading that are sufficiently quick and reliable to put in a pre-upload check. I think that 99% of the time I'm going to want to run jj run -r 'stack(@)' '<fix> && <lint>' before uploading, but I'm also not proposing that you should be locked in to it. ChromeOS uses repo upload --no-verify to skip your presubmits, and I imagine that if we added pre-upload checks, we'd also add an option to skip them.

Suppose that we just allowed the user to override the built-in commands with their own aliases

An alias could solve things for most use cases, but will prevent proper integration with tools such as IDEs. Also, I was told a while back that we were explicitly choosing not to be able to override builtin commands (as doing so would break any tooling that invoked them).

For the case of large organizations, it seems they successfully get their developers to use blessed wrapper tools for things like builds, tests, code review, etc., so I don't think the hook thing is as big of a problem.

I don't think this is a solve for the problem. Sure, it mitigates it, but the way it is currently I need a mental model of which project I'm working on, so I know which wrapper tool to run. If I'm working on rust code in ChromeOS, for example, I need to run cros format to run my pre-upload checks. When I work on other bazel-bazed projects, I need to run buildifier. When I work on jj, I need to run cargo fmt.

Also, when I start contributing on a new project, I don't want to have to spend a few minutes trying to work out how to upload the change correctly (I've done this before). Sometimes it's not clear how to run the formatter (with bazel, for example, they tend to not provide a binary to run the formatter, and expect you to just know to run buildifier to fix and buildifier --lint to lint).

jj is also designed so that individuals or organizations can provide their own custom build by plugging into jj-lib (which is what Google does right now?), so that seems like a natural place to insert any organizational checks, without having to work out a general solution for all the hooks that might ever exist.

This doesn't really work. I work at google, and so I use google's jj, but I also work on a variety of open source projects hosted by a variety of companies. It's also a lot easier to convince a team of developers who primarily use git to just add a few lines of config to their repo to support jj developers than it is to convince them to literally maintain a whole rust project for that one jj developer who works on their repo.

On the other hand, this might be a symptom of the tools we're used to

I strongly believe that this is the case. I've been exposed to repo upload, so when I first switched to jj, I would never bother linting / fixing my commits, push directly to gerrit, and end up submitting bad code (the linter is intenttionally disabled on the remote for us, because they assume that you can't not run it, and thus if you submit code that fails lints there's a reason for it). Even if that weren't the case, it will be several minutes before the presubmits actually get run, so it would take ages before I was notified of a given failure.

I don't mind aliases to do this for most users

I'm not a fan of aliases. I use them because there's not a better solution, but I don't think it's the best solution. I think that uploading to Chrome (git cl upload), chromeos (repo upload), and any random open source project (git push) should all use the exact same command. If there is presubmits, the user should never be allowed to skip them when uploading unless they explicitly say something like --no-verify. What I want is not a new, safe path. I want guardrails on the existing path.

mlcui-corp commented 5 months ago

I am still not sure about how I feel about pre-upload hooks.

For me, this is a tradeoff, and possibly a philosophical question of how jj should work - should jj commands be minimal and focused, or be "batteries included" (to be more specific - "have space for batteries (hooks) for users, if they want to plug some in")?

I think there is elegance in supplying a minimal tool which does a single job well (just uploading to remotes), akin to the Unix philosophy. Maintenance of a minimal interface becomes easy, and tools can be very focused. Composition (like using aliases) would be encouraged.

On the other hand, having hooks built in would definitely reduce the friction to have an "all-in-one" upload command, apart from initial hook setup, compared to the alternatives:

Adding pre-upload hooks would also encourage a standard of sharing common commands of "lint / fix / test a revset", which would make it much easier to have the "exact same command" behaviour across repos. We may need to be careful about how much of jj run we ingrain in this - consider a hook implemented with jj-lib, which could analyse a series of commits without checking them out.

I think the "exact same command" benefit is a bit unrelated to the question of whether we want pre-upload hooks in the first place. A user can get the "exact same command" behaviour by creating some script which runs the things they want before upload, in any repository, and use run_preupload_for_repository [REVSET] && jj upload [REVSET] [UPLOAD_FLAGS] in any repository. However, as mentioned above, this is a fair amount of friction compared to a pre-upload hook, and would not have the easily shareable standard (a jj configuration) that a pre-upload hook would have.

My only concern about doing adding pre-upload hooks would be the extra cost to jj:

If this cost isn't too great, I'd be for adding in pre-upload hooks - even if it is just for encouraging a standard of sharing lint / fix / test commands.

bbigras commented 5 months ago

I miss being able to use https://github.com/cachix/git-hooks.nix

PhilipMetzger commented 5 months ago

My opinion on pre-upload hooks is very neutral, as they're primarily for interop and not a concept I'd want to keep long-term, we should rather fix it in a server implementation when we get there.

For me, this is a tradeoff, and possibly a philosophical question of how jj should work - should jj commands be minimal and focused, or be "batteries included" (to be more specific - "have space for batteries (hooks) for users, if they want to plug some in")?

While the interface is generally simple and powerful, better forge integration has been a major point for a long time (#485). And providing something like pre-upload hooks are probably required to migrate off of Git, as a bunch of tooling sadly depends on it (like Nix via Cachix or Chromium's PRESUBMIT.py).

On the other hand, this might be a symptom of the tools we're used to

I strongly believe that this is the case.

I strongly agree too, but think we should fix it with our server implementation, when we get there. But as interop is important for jj, we should provide at least a minimal interface, which pre-upload hooks are.

Adding pre-upload hooks would also encourage a standard of sharing common commands of "lint / fix / test a revset", which would make it much easier to have the "exact same command" behaviour across repos.

I think it's a major benefit if its done via jj instead of having 30'000 different tools with custom flavors, essentially doing the same thing.

We may need to be careful about how much of jj run we ingrain in this - consider a hook implemented with jj-lib, which could analyse a series of commits without checking them out.

I don't think exposing another interface than jj run or jj fix should be planned, as these commands can cover many workflows as fix rewrites commits in memory anyway.

If this cost isn't too great, I'd be for adding in pre-upload hooks - even if it is just for encouraging a standard of sharing lint / fix / test commands.

Agreed, they're the best solution until we have a better design for a native future.

talios commented 1 month ago

I came here looking for git-hook support as well, specifically commit-msg, we use Gerrit for our code review system, which appends a Change-ID footer to the initial commit message to track revised patch-sets.

Looks like I've got a lot of reading above to checkout.

martinvonz commented 1 month ago

I came here looking for git-hook support as well, specifically commit-msg, we use Gerrit for our code review system, which appends a Change-ID footer to the initial commit message to track revised patch-sets.

Looks like I've got a lot of reading above to checkout.

In case it saves you or someone else some time, there's PR #2845 for adding a "jj gerrit send" command.