Feature: post-fetch hooks

kim commented 3 years ago

It was brought up that people would want to run arbitrary code on the host machine whenever a specific project was updated from the network (what they mean is when a specific remote, and possibly specific branches, of a specific project was updated).

Git doesn't provide an equivalent to post-receive for fetches, iiuc due to the magical properties of FETCH_HEAD. We would probably also not want people to actually place code in the hooks/ directory.

While we can intercept when a fetch is complete (ie. after all refs have been updated), the design question here is how to trigger the execution of said code: should updates be queued and popped by a request to a localhost-only API? Should the {daemon,seed} actually fork a process? Websockets? Pipes? DBus?

viraptor commented 3 years ago

Since this seems like a use case close to CI, a simple way of doing this would be great. Seems like processes can be spawned in an async way, so https://docs.rs/tokio/1.4.0/tokio/process/index.html would be great. On the other hand spawning a whole process is pretty heavy considering it should be relatively easy to flood a seed with tiny updates. In that case DBus would be great (persistent listener, rich notification, ability to filter / drop messages).

I think those would appeal to different groups of people (private systems and large seeds/CIs) - having an option for either a process or DBus would be amazing.

kim commented 3 years ago

Portability aside, can DBus be set up so it has ringbuffer semantics? I think we would want to just fire-and-forget such events, and not be concerned with buffering, process orchestration, etc

I guess I’m just looking for webhooks, except there doesn’t necessarily need to be a web in those hooks.

cloudhead commented 3 years ago

The most straightforward design to me would be a daemon listening on project events (ie. new refs) remotely via the Rust API, and spawning child processes (eg. checkout + run tests, mirror to github, etc.). This doesn't have to be integrated at the protocol level.

The complexity will be in the implementation: if you want it to be durable, fault-tolerant etc. then it'll require some additional work, but this can be left to the implementor.

But I see this running on a server, not locally.

kim commented 3 years ago

Yes we could do this just like a webhook, where the connection could be over TCP, a named pipe, or a UNIX socket. This is still not free, however, and could easily pile up writes when the consumer is slow and the buffers fill up. Also, the consumer address would need to be statically configured (or some fancy hot config reloading needs to be put in place).

So I'm thinking that it would be nicer to have the seed/daemon expose an endpoint through which a consumer can subscribe to a stream of events, optionally filtered by some predicate.

cloudhead commented 3 years ago

Ah interesting, so the seed/daemon isn't the end-consumer, rather it exposes a stream of messages to be consumed by any listener, eg. via SSE. This could work, however it would still require the consumer process to be local to the seed, since it'll likely want monorepo access.

kim commented 3 years ago

so the seed/daemon isn't the end-consumer

No no, it is the producer. This can't be intercepted at the git level, but one layer up.

it would still require the consumer process to be local to the seed, since it'll likely want monorepo access.

But that's fine, no? I've made friends with the idea to deprecate the remote helper once #576 lands, and instead have people interact with the repo via HTTP.

It would then be a matter of binding "internal" APIs to the right NIC to be able to build any kind of worker topology.

cloudhead commented 3 years ago

Yeah, I think this is fine!

Interestingly, the same API could be used to build read-only UIs :thinking:

kim commented 3 years ago

the same API could be used to build read-only UIs

By HTTP I meant the git smart HTTP protocol :) I suppose you'd need the surf API as well for that. Up for discussion /cc @FintanH @xla

cloudhead commented 3 years ago

Ah, but wouldn't the exposed endpoint for events be over HTTP as well? Or are you thinking lower level?

kim commented 3 years ago

Ya sure, but I wouldn't design this to emit all kinds of events, only those which carry information about refs having been updated as the effect of a fetch (or a push also, if the daemon allows that). Otherwise.. well, do we need to get our GraphQLs ready to be able to construct filter predicates?

I mean, I'm not saying it's completely off the table to provide some kind of push-based API. That's just a slightly larger scope, and someone needs to be responsible for designing, documenting, and maintaining such a thing.

cloudhead commented 3 years ago

Yeah I'm with you. I think an event stream simply of (projectId, commitHash) objects (and perhaps one for identity updates) will cover 80% of the use-cases. It should also be simple enough to put together a custom node that listens on and exposes other events, if someone is interested in other events.

radicle-dev / radicle-link

Feature: post-fetch hooks #581