conflict resolution for remotes

Gozala commented 6 years ago

Hi @noffle

First of all thanks for doing this. I wanted to followup on twitter thread and here seems more appropriate. I have to admit I have never used git-ssb although I read through your intro and to be honest I found same thing confusing there as well. Preserving both versions to let user do a merge sounds pretty reasonable, but what I don't understand is how that might work in practice, specifically it seems to me that it would not really resolve the issue but rather mitigate it a bit. What if two authors end up resolving conflicts in a different way, then two forks might end up evolving differently and on sync would end up with a same problem. In other words it seems like preserving both versions and letting user do a merge would work as long as participants sync up fairly often, which most likely would be the case, but if they don't, I don't think this would help.

I'll write some thought on how I think that might be solved in a separate comment.

Gozala commented 6 years ago

Here is what I think might work better (I would love feedback):

Maintainer adds remote for each collaborator, providing a way for them to propose changes - "pull-requests".
Collaborator proposes changes - creates pull-request by pushing to own remote, tracked by a maintainer. Since maintainer tracks that remote "pull-requests" from that peer will show up on next fetch.
To become a collaborator / send a first pull-request, person needs to get provide hypergit://{key} off band, email, twitter, through acquaintance.

Above should work with a single maintainer instance, but would not if project has multiple maintainers who can merge-in pull requests. I think such scenario could be addressed by encoding maintenance rules and forcing those through hypergit. I'll describe simple version below but in practice it will likely need to be more complicated than that.

Maintainers take shifts to do housekeeping, meaning if maintainer A pushed to upstream next push will have to be by maintainer B, then C and etc.. then start over.
- This provides a way to verify that coordination across maintainers occurred prior to push, and there for no conflict could occur.
Maintainer A could delegate it's shift to other maintainer say C by creating record of that, so that the hypergit can still audit the history.

Side notes:

Would be nice if tracked remotes were stored in the repo somehow so that you can follow all collaborators on cloning.
Maintenance shift's in described order likely will be impractical, instead probably each push should just contain record that nominates next author of the push. That way maintainers can align housekeeping duties with their own schedules.
- Only thing that really matters in the shifts is that each uplift contains a deterministic record that allows hypergit audit the history and there for prevent conflicting uplifts.
- https://holochain.org/ has a similar concept that they call dna. In holochain apps have distributed ledger and app dna describes data structure added to the ledger and a logic to for auditing ledger.

hackergrrl commented 6 years ago

I'm leaning toward a model where each user has their own remote that only they can write to. This prevents the conflict case altogether. To collaborate, you'd pull from your friend's hypergit remote, and merge their work into your repo, and push.

dereke commented 6 years ago

That is a great idea @noffle. A shared offline syncing repository would be a merge nightmare!

Looking forward to seeing how this project progresses. Keep it up!

Gozala commented 6 years ago

I'm leaning toward a model where each user has their own remote that only they can write to. This prevents the conflict case altogether. To collaborate, you'd pull from your friend's hypergit remote, and merge their work into your repo, and push.

Does not that imply no canonical repo ?

In a way that is exactly what I was describing in a sole maintainer scenario. And I think it would work great for that.

For large projects that require multiple maintainers it may be problematic though. Either way it seems like multimIntainer case can be built on top and it would make a lot of sence to do so. If I get some time I might try doing that as well

hackergrrl commented 6 years ago

@Gozala Yeah, I think we ought to nix the "blessed repo" canon. Communities can still consider some particular remote sacred, but maybe this tool doesn't dictate that.

Me and @staltz have been talking about a frontend (gitverse) that makes this sort of collaboration easier. Let's try this out and see how it goes & where we need better tooling?

hackergrrl commented 6 years ago

btw @Gozala I :heart: your work on wisp!

Gozala commented 6 years ago

btw @Gozala I ❤️ your work on wisp!

Thanks!

@Gozala Yeah, I think we ought to nix the "blessed repo" canon. Communities can still consider some particular remote sacred, but maybe this tool doesn't dictate that.

I don’t necessarily think it’s about blessing particular remote. I think in that thread you and @staltz express desire is to encourage convergence, but without a way to share maintanance burden it seems to me you’ll do the opposite as different remotes can end up merging in a different ways and each collaborator will have to pull in stuff from everyone else. In all likelihood there will still be one or two active remotes that everyone else would end up following, which is not a bad thing, what I’m trying to say is that collaborators will have to coordinate to reduce conflict resolution overhead and that could either happen off band or tool could provide assistance in that coordination. I don’t think large repos (like mozilla central) could possibly manage without a tool.

Please don’t take it as a criticism, I’m super excited about this effort and taking a step at a time (which is what you’re going for) makes total sense :+1:

hackergrrl commented 6 years ago

Thank you for explaining that a bit more @Gozala. I think I was taking it crit and getting a bit defensive, so I appreciate you clarifying on that as well.

I wonder if such a tool might be something like what folks are talking about here on twitter. Something like a peer/bot that listened to all "known" contributors and merges changes into its own master/public branch, and makes itself available to the rest of the network for cloning. This wouldn't be unlike having a central server, so folks who want that model still could!

staltz commented 6 years ago

Hi!

To support "multiple maintainers managing a common repo", they can push to the same hypergit remote, using hyperdb's multiauthor support. Basically it's still the "one hypergit remote per user", except a user can be an organization, in other words, a user can actually be a group of users.

This new collaboration model we're working on is decentralized, but it doesn't forbid centralization. So this means that the familiar and centralized GitHub collaboration model should still be possible to achieve in hypergit+gitverse, but also other models should be possible. It's a generalization.

Gozala commented 6 years ago

Thank you for explaining that a bit more @Gozala. I think I was taking it crit and getting a bit defensive, so I appreciate you clarifying on that as well.

I was little on the fence on whether I should just 🤐, I’m glad I didn’t

I wonder if such a tool might be something like what folks are talking about here on twitter. Something like a peer/bot that listened to all "known" contributors and merges changes into its own master/public branch, and makes itself available to the rest of the network for cloning. This wouldn't be unlike having a central server, so folks who want that model still could!

Thanks for pointing that thread out, I was actually thinking about similar approach as well, which might be easier, although as you pointed out it introduces centralization.

Gozala commented 6 years ago

Hi!

Hi thanks for joining the conversation

To support "multiple maintainers managing a common repo", they can push to the same hypergit remote, using hyperdb's multiauthor support. Basically it's still the "one hypergit remote per user", except a user can be an organization, in other words, a user can actually be a group of users.

Unless I’m misunderstanding something here in that scenario hyperdb will be only able to guarantee that all authehors converge onto same state, but not necessarily that converged state would make sense.

I’m not very familiar with Hyperdb but as far I gathered it’s CRDT’s based, so in some instances it may have to choose order of changes in some universaly deteministic way (usually alphabetical order)

Which is why I satrted wondering if maybe rules of convergence can be encoded in the repo itself and if they are universally deterministic hypergit could use that instead. Or alternatively hypergit could just reject push to that shared remote unless it can determine no conflicts cour arise. And essentially I was proposing lock on write inspired by schedulers.

This new collaboration model we're working on is decentralized, but it doesn't forbid centralization. So this means that the familiar and centralized GitHub collaboration model should still be possible to achieve in hypergit+gitverse, but also other models should be possible. It's a generalization.

Please note that my proposal isn’t conflicting with that at all. All it does allow coordination (by choice & per group bases) to avoid convergence on undesired state, which I suspect might mean corrupt git repo (but I’m not entirely sure about it)

More specifically I imagine repo could include a file describing who’s allowed to push to a remote and in which order

hackergrrl commented 6 years ago

@staltz I was thinking we'd actually not use hyperdb's multi-writer feature here, because the conflict mode for "two users pushed to the same branch while offline, then sync'd" could be pretty confusing. I think there's some discussion on this higher up ^ in this thread. Do you have thoughts on how that case might be made simpler?

Gozala commented 6 years ago

I also would like to stress that I'm not here to argue I'm genuinely interested in decentralized git and was just discussing this to learn from you and see if truly decentralized but consistent git collaboration could be possible and in which way.

Gozala commented 6 years ago

Here are some more thoughts comparing the coordination via centralization (referring to a bot option) approach with coordination via deterministic rules:

So bot follows bunch of remotes from contributors that it needs to merge, somehow it would need to determine in which order to do so. In theory it should not matter unless conflicts arise in which case bot would probably treat some contributor heads un-mergable and ideally will notify contributors in some way.
Bot would have to run on dedicated node and essentially act as a server.
Maybe bot could actually provide a review system by only merging heads that have being signed by other contributors.
I suspect even with Bot case there will be desire to
- Coordinate releases, in other words somehow signal the order
- Do merges on different branches, should not be difficult just need some naming rules so that contributors could express which branch to target.
- Support tagging versions

Only thing I have reservations regarding is of server requirement. What I would rather wish for is to distribute that across the contributors such that they could arrive to a same state by executing "bot logic" on their own machines. In fact thinking about the bot scenario led me to some more ideas how that could be achieved, and how that contrasts with bot approach:

List of collaborators / remotes being tracked lives in some file in git repo let's say .contributors which is just a list of hypergit: remotes that this repo tracks.
Everyone still works to their own remotes to signal "pull request". They just create branch on own remote with some naming convention say pull/${name}.
When peer executes fetch hypergit runs deterministic merges from all the remotes listed in .contributors file into dedicated branch let's call it upstream in a following order:
1. Lookup an author of the last commit in upstream to identify which URL it corresponds to in .contributors. Next remote from the list will be the remote from which pull will be merged
2. Lookup pull requests from that remote, pick the oldest one that has not being merged yet. And attempt to merge it into upstream. If successful continue to step 1. If unable to do a clean merge continue to step 3.
3. Create a commit that just contains metadata that specific merge was not successful, mentioning remote, branch name and commit sha.
4. Pick the next remote from the .contributors list and continue from step 2.
5. If there are no pull's from picked contributor stop until next fetch.
Unless there is a flaw in described logic this should provide eventually consistent upstream branch without central coordination. Although there are some limitations:
- Force push should be banned as it can undermines logic which should provide consistency.
- If contributor who has a turn has no changes it could essentially block progress. One possible solution here might be is during fetch if contributor is the one who has a turn but has no pull's to submit could automatically create a "yield commit" just to allow further progress.
- If contributor is gone for whatever reason and it's that contributors "turn" to make update it becomes impossible to remove that contributor from the list or make any progress. There needs to be some deterministic way to do that as well. I don't have a good answer for this, but one possibility could be for all the other contributors to create some special commit that skips the turn. In that case I think everyone would be able to converge on same state after fetch.

Please let me know what do you think ? Or if you are interested at all in having this conversation. Thanks

Gozala commented 6 years ago

Described logic does not really mentions anything regarding reviews, mainly because it does complicate things quite a bit as if you have pull with pending review it would not make sense to block progress, but you can't also safely skip turn and preserve consistency as some nodes may see both pull & review and others may see just pull without review. Maybe something along the lines of "yield commit" from reviewer can be used or maybe pull should contain sha of the reviewer's approval commit as a prove to be considered for the merge.

RangerMauve commented 6 years ago

Personally, I'm against using multiwriter with a shared repo, for the same reasons as others have pointed out.

I think that the issue of multiple collaborators having to pull from each other could be addressed by having them be prompted to do the pull automatically whenever they start working. A project could keep track of all the collaborators repos and a tool could detect missing changes and pull from everyone.

The "main" repo would be programmatically determined by whoever is furthest ahead in their master branch.

hackergrrl commented 6 years ago

staltz commented 6 years ago

Unless I’m misunderstanding something here in that scenario hyperdb will be only able to guarantee that all authehors converge onto same state, but not necessarily that converged state would make sense.

Oh, sorry, I totally missed the problems of concurrency related to multi-writer plus offline-first.

I don't have anything smart to comment now but I'll read these comments carefully and think about the technical problem, and come back.

Gozala commented 6 years ago

Personally, I'm against using multiwriter with a shared repo, for the same reasons as others have pointed out.

:+1:

I think that the issue of multiple collaborators having to pull from each other could be addressed by having them be prompted to do the pull automatically whenever they start working. A project could keep track of all the collaborators repos and a tool could detect missing changes and pull from everyone.

Unless tool can guarantee that everyone can converge onto same change history regardless of network participationing it would make collaboration pretty difficult IMO.

What I’m trying to propose is the rules that tool can follow to ensure that everyone converges onto same change history.

The "main" repo would be programmatically determined by whoever is furthest ahead in their master branch.

If you can have tool that deterministicly merges changes from all remotes you no longer beed main repo as every fork will converge onto same state. So main repo is just merger of all forks in deterministic manner.

Imagine pure function that takes list of git changelogs (tracked remotes) and returns changelog, returned changelog is your main and function implementation is the rules I’m proposing we define

ralphtheninja commented 6 years ago

@staltz Really digging your gitverse ideas. It's really helpful to discuss use cases and not think too much about implementation details in the beginning.

RangerMauve commented 6 years ago

@Gozala

have tool that deterministicly merges changes from all remotes

The tool I was thinking is Git. 😸 For a lot of changes you could merge without having to have any user interaction, and if there are conflicts, they're detected and the user can be prompted to resolve them.

Gozala commented 6 years ago

The tool I was thinking is Git. 😸 For a lot of changes you could merge without having to have any user interaction, and if there are conflicts, they're detected and the user can be prompted to resolve them.

Yes, but there is a big but. If there is a conflict n different users might resolve it m <= n different ways and that is a the problem I'm proposing to tackle. It's probably not going to be a big deal for smaller projects with <= 2 maintainers but if you consider larger repos like mozilla-central with >=100 active committers overhead is going to be unreasonable. I would really ❤️ it to work on that scale.

P.S. To be clear I don't think what I'm proposing is going to work for such a large project, but I hope if start thinking in this direction we could define rules that would be able to.

FreddieRidell commented 6 years ago

I think a possible problem is that currently hypergit remotes aren't single-user, they're single-device. If I want to work on my repo using my desktop and my laptop that's going to be a bit of a pain to manage two remotes which both represent only my work.

I agree that offline-first multi-writer would be very hard, but I think we need to come up with some effective client-side tooling to help with this. Possibly even a companion tool, separate to hypergit, that helps with pulling/merging-in from various remotes?

staltz commented 6 years ago

Possibly even a companion tool, separate to hypergit, that helps with pulling/merging-in from various remotes?

That's gitverse, we're working on it. :)

Gozala commented 6 years ago

I think a possible problem is that currently hypergit remotes aren't single-user, they're single-device. If I want to work on my repo using my desktop and my laptop that's going to be a bit of a pain to manage two remotes which both represent only my work.

Can you please elaborate what pain points do you have in mind ? Would not you just pull your one machine's remote in your other and carry on working ?

FreddieRidell commented 6 years ago

I think a possible problem is that currently hypergit remotes aren't single-user, they're single-device. If I want to work on my repo using my desktop and my laptop that's going to be a bit of a pain to manage two remotes which both represent only my work.

Can you please elaborate what pain points do you have in mind ? Would not you just pull your one machine's remote in your other and carry on working ?

I thought about this, but that requires me to keep both computers on. A key benefit of github was this workflow:

code on desktop
push to github
turn off desktop and move to laptop
pull from github
resume coding where I left off

I could emulate this by hosting my own normal git server, that mirrors to a hypergit remote, but that's a lot of effort for what was once a simple task :confused:

staltz commented 6 years ago

How about:

code on desktop
push to hypergit://1038..6868
turn off desktop and move to laptop
pull from hypergit://1038..6868
resume coding
push to hypergit://a77c..1b12
turn off laptop and move to desktop
pull from hypergit://a77c..1b12

FreddieRidell commented 6 years ago

That absolutely works, and would probably be manageable for two devices. But when you get to three or four devices (not playing devils advocate, that's part of my normal workflow sometimes) that's a lot of remotes to remember, merge, and manage.

(not trying to be difficult btw, just making sure we've got as much data as possible for decision making :) )

staltz commented 6 years ago

No need to remember all those remotes! That's what gitverse will do. If you have the codebase, you can run gitverse join (no arguments) and it'll connect you with a swarm (a community) of other people also on that codebase and that community will hold an index of remotes you can pull from, or fork from.

FreddieRidell commented 6 years ago

Hmm, you're right, I might be catastrophizing the differences a bit here; and it's probably a small price to pay for decentralisation anyway!

We'll probably need some equivalent of sameAs for hypergit remotes, or a way of marking a remote as hidden. This way I can tell people to always pull from my freddie-desktop remote, and to ignore my freddie-laptop, and freddie-mobile remotes as they're meant for my eyes only

Gozala commented 6 years ago

Hmm, you're right, I might be catastrophizing the differences a bit here; and it's probably a small price to pay for decentralisation anyway!

We'll probably need some equivalent of sameAs for hypergit remotes, or a way of marking a remote as hidden. This way I can tell people to always pull from my freddie-desktop remote, and to ignore my freddie-laptop, and freddie-mobile remotes as they're meant for my eyes only

I think those are very valid concerns, in fact IMO they are exact same concerns as the ones I hold, we are just considered different scenarios. I believe that there needs to be a mechanism for participants (whether those are different machines that same individual controls or different individuals) need to converge onto canonical git history or participants would need to coordinate changes manually. I don't believe manual option can be managed at scale.

It seems that @noffle and @staltz are exploring way to facilitate coordination with gitverse and I would love to get better understand what it would look like.

staltz commented 6 years ago

@Gozala Here's a design doc: https://gitlab.com/staltz/gitverse-ideas but we've evolved some of those ideas since then. Also I'm building a CLI, you can pul the code from here: hypergit://1c2a333909c421e1983b4a098db673476836b7191484565bc0c046406fcd4ec0

hackergrrl / hypergit

conflict resolution for remotes #1