radicle-dev / radicle-alpha

A peer-to-peer stack for code collaboration
https://radicle.xyz
MIT License
912 stars 33 forks source link

Collaborative project maintanance #547

Open Gozala opened 5 years ago

Gozala commented 5 years ago

Hi,

As I was reading through radicle FAQ I came across few questions that I'd like to provide some thoughts / pointers on:

If I am a project owner, can I get updates from contributors if we are not online at the same time?

I have being thinking about how to enable large number of participants collaborate / maintain the repository in the fully p2p setup. This has came up in the context of hypergit and described it in the following thread, for convenience I'll quote interesting bits below:

Above should work with a single maintainer instance, but would not if project has multiple maintainers who can merge-in pull requests. I think such scenario could be addressed by encoding maintenance rules and forcing those through hypergit. I'll describe simple version below but in practice it will likely need to be more complicated than that.

  • Maintainers take shifts to do housekeeping, meaning if maintainer A pushed to upstream next push will have to be by maintainer B, then C and etc.. then start over.

    • This provides a way to verify that coordination across maintainers occurred prior to push, and there for no conflict could occur.
  • Maintainer A could delegate it's shift to other maintainer say C by creating record of that, so that the hypergit can still audit the history.

Side notes:

  • Would be nice if tracked remotes were stored in the repo somehow so that you can follow all collaborators on cloning.

  • Maintenance shift's in described order likely will be impractical, instead probably each push should just contain record that nominates next author of the push. That way maintainers can align housekeeping duties with their own schedules.

  • Only thing that really matters in the shifts is that each uplift contains a deterministic record that allows hypergit audit the history and there for prevent conflicting uplifts.

  • https://holochain.org/ has a similar concept that they call dna. In holochain apps have distributed ledger and app dna describes data structure added to the ledger and a logic to for auditing ledger.


Here are some more thoughts comparing the coordination via centralization (referring to a bot option) approach with coordination via deterministic rules:

  • So bot follows bunch of remotes from contributors that it needs to merge, somehow it would need to determine in which order to do so. In theory it should not matter unless conflicts arise in which case bot would probably treat some contributor heads un-mergable and ideally will notify contributors in some way.

  • Bot would have to run on dedicated node and essentially act as a server.

  • Maybe bot could actually provide a review system by only merging heads that have being signed by other contributors.

  • I suspect even with Bot case there will be desire to

    • Coordinate releases, in other words somehow signal the order
    • Do merges on different branches, should not be difficult just need some naming rules so that contributors could express which branch to target.
    • Support tagging versions

Only thing I have reservations regarding is of server requirement. What I would rather wish for is to distribute that across the contributors such that they could arrive to a same state by executing "bot logic" on their own machines. In fact thinking about the bot scenario led me to some more ideas how that could be achieved, and how that contrasts with bot approach:

  • List of collaborators / remotes being tracked lives in some file in git repo let's say .contributors which is just a list of hypergit: remotes that this repo tracks.

  • Everyone still works to their own remotes to signal "pull request". They just create branch on own remote with some naming convention say pull/${name}.

  • When peer executes fetch hypergit runs deterministic merges from all the remotes listed in .contributors file into dedicated branch let's call it upstream in a following order:

    1. Lookup an author of the last commit in upstream to identify which URL it corresponds to in .contributors. Next remote from the list will be the remote from which pull will be merged
    2. Lookup pull requests from that remote, pick the oldest one that has not being merged yet. And attempt to merge it into upstream. If successful continue to step 1. If unable to do a clean merge continue to step 3.
    3. Create a commit that just contains metadata that specific merge was not successful, mentioning remote, branch name and commit sha.
    4. Pick the next remote from the .contributors list and continue from step 2.
    5. If there are no pull's from picked contributor stop until next fetch.
  • Unless there is a flaw in described logic this should provide eventually consistent upstream branch without central coordination. Although there are some limitations:

    • Force push should be banned as it can undermines logic which should provide consistency.
    • If contributor who has a turn has no changes it could essentially block progress. One possible solution here might be is during fetch if contributor is the one who has a turn but has no pull's to submit could automatically create a "yield commit" just to allow further progress.
    • If contributor is gone for whatever reason and it's that contributors "turn" to make update it becomes impossible to remove that contributor from the list or make any progress. There needs to be some deterministic way to do that as well. I don't have a good answer for this, but one possibility could be for all the other contributors to create some special commit that skips the turn. In that case I think everyone would be able to converge on same state after fetch.

Please let me know what do you think ? Or if you are interested at all in having this conversation. Thanks

From what I gathered RSM in many ways similar to holochain which makes me think this might be a good fit.

Can I collaborate privately?

I have started work recently on Content content-addressable data feed that in many ways similar ssb-feed but is based on IPLD. It also attempts to provide granular access control which I think might enable private collaboration & I'd be interested in collaborating / feedback on how to make it usable in this context.

jkarni commented 5 years ago

I still haven't completely caught up with hypergit's architecture and that discussion, so I'll limit myself to describing the thoughts we've had around this, and get back to you when I understand hypergit's situation and proposed solutions a bit better.

There isn't all that much info on our architecture yet (though a blog post is coming soon) so I'll describe the relevant parts here, briefly.

First, the "pure" git part (the git IPFS remote helper) and the other parts (e.g. issues, projects, patches, and eventually more) are a bit different. The latter are made of "machines" (i.e., the RSMs) programmed in Radicle. Let's talk about that first because in theory you could have that part alone (most obviously, by using a patch-based VCS like pijul or darcs, but also by just storing git commits rather than all the usual objects git uses). (Indeed, if you have the problem solved in the machines, even if there is only one person who can write to the git part, you can have multiple people who are allowed to accept patches on the patch machine, and can always reconstitute the latest state of the repo by fetching the latest git, and applying all newer patches).

Each "machine" is an IPNS link. The person who possesses the keys corresponding to that link can update the link. When they're online, they connect to a pubsub channel; people who want to write send messages there, and the owner's daemon automatically accepts (valid) messages, puts them on IPFS, and updates the pointer. The new data on IPFS contains what was added, plus a pointer to the previous IPFS data; thus, you're always prepending to a list.

The person with the key has to be trusted - though they can't forge messages (they're signed), then can remove messages.

Usually when you collaborate, there are people you trust - traditionally these are called maintainers. We have thought of a system to allow multiple keys to exist, and the project being in some sense the "sum" of these keys. The idea is that, instead of having a single IPNS link, you have as many as there are maintainers. Then maintainers update their IPNS link either by receiving new pubsub messages or, when they go online, by resolving the IPNS of the other maintainers. If IPNS links don't diverge in histories, then you keep getting the longest one. If they ever do, you can try merging (with some sort of arbitrary ordering over keys to determine what side to merge from). If that fails, prompt whomever first notices that failure for merge conflict resolution. If you're careful, this should happen relatively rarely if people are online when accepting patches (at least when we've got near-instantaneous IPNS updates).

Having multiple maintainers/owners to a project ameliorates the "offline writes" problem, but it doesn't solve it. To make sure contributors (non-owners/maintainers) can write and go offline, one option is to have other computers in the network that can temporarily store the data - a federated message queue like ActivityPub, or just nodes that hold messages they see on pubsub channels and try delivering them periodically. @jameshaydon has a more sophisticated idea here, so he can maybe chime in.