Open Gozala opened 6 years ago
Here is what I think might work better (I would love feedback):
hypergit://{key}
off band, email, twitter, through acquaintance.Above should work with a single maintainer instance, but would not if project has multiple maintainers who can merge-in pull requests. I think such scenario could be addressed by encoding maintenance rules and forcing those through hypergit. I'll describe simple version below but in practice it will likely need to be more complicated than that.
A
pushed to upstream next push will have to be by maintainer B
, then C
and etc.. then start over.
A
could delegate it's shift to other maintainer say C
by creating record of that, so that the hypergit
can still audit the history.Side notes:
hypergit
audit the history and there for prevent conflicting uplifts.dna
. In holochain apps have distributed ledger and app dna describes data structure added to the ledger and a logic to for auditing ledger. I'm leaning toward a model where each user has their own remote that only they can write to. This prevents the conflict case altogether. To collaborate, you'd pull from your friend's hypergit remote, and merge their work into your repo, and push.
That is a great idea @noffle. A shared offline syncing repository would be a merge nightmare!
Looking forward to seeing how this project progresses. Keep it up!
I'm leaning toward a model where each user has their own remote that only they can write to. This prevents the conflict case altogether. To collaborate, you'd pull from your friend's hypergit remote, and merge their work into your repo, and push.
Does not that imply no canonical repo ?
In a way that is exactly what I was describing in a sole maintainer scenario. And I think it would work great for that.
For large projects that require multiple maintainers it may be problematic though. Either way it seems like multimIntainer case can be built on top and it would make a lot of sence to do so. If I get some time I might try doing that as well
@Gozala Yeah, I think we ought to nix the "blessed repo" canon. Communities can still consider some particular remote sacred, but maybe this tool doesn't dictate that.
Me and @staltz have been talking about a frontend (gitverse) that makes this sort of collaboration easier. Let's try this out and see how it goes & where we need better tooling?
btw @Gozala I :heart: your work on wisp!
btw @Gozala I ❤️ your work on wisp!
Thanks!
@Gozala Yeah, I think we ought to nix the "blessed repo" canon. Communities can still consider some particular remote sacred, but maybe this tool doesn't dictate that.
I don’t necessarily think it’s about blessing particular remote. I think in that thread you and @staltz express desire is to encourage convergence, but without a way to share maintanance burden it seems to me you’ll do the opposite as different remotes can end up merging in a different ways and each collaborator will have to pull in stuff from everyone else. In all likelihood there will still be one or two active remotes that everyone else would end up following, which is not a bad thing, what I’m trying to say is that collaborators will have to coordinate to reduce conflict resolution overhead and that could either happen off band or tool could provide assistance in that coordination. I don’t think large repos (like mozilla central) could possibly manage without a tool.
Please don’t take it as a criticism, I’m super excited about this effort and taking a step at a time (which is what you’re going for) makes total sense :+1:
Thank you for explaining that a bit more @Gozala. I think I was taking it crit and getting a bit defensive, so I appreciate you clarifying on that as well.
I wonder if such a tool might be something like what folks are talking about here on twitter. Something like a peer/bot that listened to all "known" contributors and merges changes into its own master/public branch, and makes itself available to the rest of the network for cloning. This wouldn't be unlike having a central server, so folks who want that model still could!
Hi!
To support "multiple maintainers managing a common repo", they can push to the same hypergit remote, using hyperdb's multiauthor support. Basically it's still the "one hypergit remote per user", except a user can be an organization, in other words, a user can actually be a group of users.
This new collaboration model we're working on is decentralized, but it doesn't forbid centralization. So this means that the familiar and centralized GitHub collaboration model should still be possible to achieve in hypergit+gitverse, but also other models should be possible. It's a generalization.
Thank you for explaining that a bit more @Gozala. I think I was taking it crit and getting a bit defensive, so I appreciate you clarifying on that as well.
I was little on the fence on whether I should just 🤐, I’m glad I didn’t
I wonder if such a tool might be something like what folks are talking about here on twitter. Something like a peer/bot that listened to all "known" contributors and merges changes into its own master/public branch, and makes itself available to the rest of the network for cloning. This wouldn't be unlike having a central server, so folks who want that model still could!
Thanks for pointing that thread out, I was actually thinking about similar approach as well, which might be easier, although as you pointed out it introduces centralization.
Hi!
Hi thanks for joining the conversation
To support "multiple maintainers managing a common repo", they can push to the same hypergit remote, using hyperdb's multiauthor support. Basically it's still the "one hypergit remote per user", except a user can be an organization, in other words, a user can actually be a group of users.
Unless I’m misunderstanding something here in that scenario hyperdb will be only able to guarantee that all authehors converge onto same state, but not necessarily that converged state would make sense.
I’m not very familiar with Hyperdb but as far I gathered it’s CRDT’s based, so in some instances it may have to choose order of changes in some universaly deteministic way (usually alphabetical order)
Which is why I satrted wondering if maybe rules of convergence can be encoded in the repo itself and if they are universally deterministic hypergit could use that instead. Or alternatively hypergit could just reject push to that shared remote unless it can determine no conflicts cour arise. And essentially I was proposing lock on write inspired by schedulers.
This new collaboration model we're working on is decentralized, but it doesn't forbid centralization. So this means that the familiar and centralized GitHub collaboration model should still be possible to achieve in hypergit+gitverse, but also other models should be possible. It's a generalization.
Please note that my proposal isn’t conflicting with that at all. All it does allow coordination (by choice & per group bases) to avoid convergence on undesired state, which I suspect might mean corrupt git repo (but I’m not entirely sure about it)
More specifically I imagine repo could include a file describing who’s allowed to push to a remote and in which order
@staltz I was thinking we'd actually not use hyperdb's multi-writer feature here, because the conflict mode for "two users pushed to the same branch while offline, then sync'd" could be pretty confusing. I think there's some discussion on this higher up ^ in this thread. Do you have thoughts on how that case might be made simpler?
I also would like to stress that I'm not here to argue I'm genuinely interested in decentralized git and was just discussing this to learn from you and see if truly decentralized but consistent git collaboration could be possible and in which way.
Here are some more thoughts comparing the coordination via centralization (referring to a bot option) approach with coordination via deterministic rules:
Only thing I have reservations regarding is of server requirement. What I would rather wish for is to distribute that across the contributors such that they could arrive to a same state by executing "bot logic" on their own machines. In fact thinking about the bot scenario led me to some more ideas how that could be achieved, and how that contrasts with bot approach:
.contributors
which is just a list of hypergit: remotes that this repo tracks.pull/${name}
..contributors
file into dedicated branch let's call it upstream in a following order:
.contributors
. Next remote from the list will be the remote from which pull will be mergedupstream
. If successful continue to step 1. If unable to do a clean merge continue to step 3..contributors
list and continue from step 2.upstream
branch without central coordination. Although there are some limitations:
Please let me know what do you think ? Or if you are interested at all in having this conversation. Thanks
Described logic does not really mentions anything regarding reviews, mainly because it does complicate things quite a bit as if you have pull with pending review it would not make sense to block progress, but you can't also safely skip turn and preserve consistency as some nodes may see both pull & review and others may see just pull without review. Maybe something along the lines of "yield commit" from reviewer can be used or maybe pull should contain sha of the reviewer's approval commit as a prove to be considered for the merge.
Personally, I'm against using multiwriter with a shared repo, for the same reasons as others have pointed out.
I think that the issue of multiple collaborators having to pull from each other could be addressed by having them be prompted to do the pull automatically whenever they start working. A project could keep track of all the collaborators repos and a tool could detect missing changes and pull from everyone.
The "main" repo would be programmatically determined by whoever is furthest ahead in their master branch.
Unless I’m misunderstanding something here in that scenario hyperdb will be only able to guarantee that all authehors converge onto same state, but not necessarily that converged state would make sense.
Oh, sorry, I totally missed the problems of concurrency related to multi-writer plus offline-first.
I don't have anything smart to comment now but I'll read these comments carefully and think about the technical problem, and come back.
Personally, I'm against using multiwriter with a shared repo, for the same reasons as others have pointed out.
:+1:
I think that the issue of multiple collaborators having to pull from each other could be addressed by having them be prompted to do the pull automatically whenever they start working. A project could keep track of all the collaborators repos and a tool could detect missing changes and pull from everyone.
Unless tool can guarantee that everyone can converge onto same change history regardless of network participationing it would make collaboration pretty difficult IMO.
What I’m trying to propose is the rules that tool can follow to ensure that everyone converges onto same change history.
The "main" repo would be programmatically determined by whoever is furthest ahead in their master branch.
If you can have tool that deterministicly merges changes from all remotes you no longer beed main repo as every fork will converge onto same state. So main repo is just merger of all forks in deterministic manner.
Imagine pure function that takes list of git changelogs (tracked remotes) and returns changelog, returned changelog is your main and function implementation is the rules I’m proposing we define
@staltz Really digging your gitverse ideas. It's really helpful to discuss use cases and not think too much about implementation details in the beginning.
@Gozala
have tool that deterministicly merges changes from all remotes
The tool I was thinking is Git. 😸 For a lot of changes you could merge without having to have any user interaction, and if there are conflicts, they're detected and the user can be prompted to resolve them.
The tool I was thinking is Git. 😸 For a lot of changes you could merge without having to have any user interaction, and if there are conflicts, they're detected and the user can be prompted to resolve them.
Yes, but there is a big but. If there is a conflict n
different users might resolve it m <= n
different ways and that is a the problem I'm proposing to tackle. It's probably not going to be a big deal for smaller projects with <= 2
maintainers but if you consider larger repos like mozilla-central with >=100
active committers overhead is going to be unreasonable. I would really ❤️ it to work on that scale.
P.S. To be clear I don't think what I'm proposing is going to work for such a large project, but I hope if start thinking in this direction we could define rules that would be able to.
I think a possible problem is that currently hypergit
remotes aren't single-user, they're single-device. If I want to work on my repo using my desktop and my laptop that's going to be a bit of a pain to manage two remotes which both represent only my work.
I agree that offline-first multi-writer would be very hard, but I think we need to come up with some effective client-side tooling to help with this. Possibly even a companion tool, separate to hypergit
, that helps with pulling/merging-in from various remotes?
Possibly even a companion tool, separate to hypergit, that helps with pulling/merging-in from various remotes?
That's gitverse, we're working on it. :)
I think a possible problem is that currently hypergit remotes aren't single-user, they're single-device. If I want to work on my repo using my desktop and my laptop that's going to be a bit of a pain to manage two remotes which both represent only my work.
Can you please elaborate what pain points do you have in mind ? Would not you just pull your one machine's remote in your other and carry on working ?
I think a possible problem is that currently hypergit remotes aren't single-user, they're single-device. If I want to work on my repo using my desktop and my laptop that's going to be a bit of a pain to manage two remotes which both represent only my work.
Can you please elaborate what pain points do you have in mind ? Would not you just pull your one machine's remote in your other and carry on working ?
I thought about this, but that requires me to keep both computers on. A key benefit of github was this workflow:
I could emulate this by hosting my own normal git server, that mirrors to a hypergit remote, but that's a lot of effort for what was once a simple task :confused:
How about:
That absolutely works, and would probably be manageable for two devices. But when you get to three or four devices (not playing devils advocate, that's part of my normal workflow sometimes) that's a lot of remotes to remember, merge, and manage.
(not trying to be difficult btw, just making sure we've got as much data as possible for decision making :) )
No need to remember all those remotes! That's what gitverse will do. If you have the codebase, you can run gitverse join
(no arguments) and it'll connect you with a swarm (a community) of other people also on that codebase and that community will hold an index of remotes you can pull from, or fork from.
Hmm, you're right, I might be catastrophizing the differences a bit here; and it's probably a small price to pay for decentralisation anyway!
We'll probably need some equivalent of sameAs
for hypergit
remotes, or a way of marking a remote as hidden. This way I can tell people to always pull from my freddie-desktop
remote, and to ignore my freddie-laptop
, and freddie-mobile
remotes as they're meant for my eyes only
Hmm, you're right, I might be catastrophizing the differences a bit here; and it's probably a small price to pay for decentralisation anyway!
We'll probably need some equivalent of sameAs for hypergit remotes, or a way of marking a remote as hidden. This way I can tell people to always pull from my freddie-desktop remote, and to ignore my freddie-laptop, and freddie-mobile remotes as they're meant for my eyes only
I think those are very valid concerns, in fact IMO they are exact same concerns as the ones I hold, we are just considered different scenarios. I believe that there needs to be a mechanism for participants (whether those are different machines that same individual controls or different individuals) need to converge onto canonical git history or participants would need to coordinate changes manually. I don't believe manual option can be managed at scale.
It seems that @noffle and @staltz are exploring way to facilitate coordination with gitverse
and I would love to get better understand what it would look like.
@Gozala Here's a design doc: https://gitlab.com/staltz/gitverse-ideas but we've evolved some of those ideas since then. Also I'm building a CLI, you can pul the code from here: hypergit://1c2a333909c421e1983b4a098db673476836b7191484565bc0c046406fcd4ec0
Hi @noffle
First of all thanks for doing this. I wanted to followup on twitter thread and here seems more appropriate. I have to admit I have never used git-ssb although I read through your intro and to be honest I found same thing confusing there as well. Preserving both versions to let user do a merge sounds pretty reasonable, but what I don't understand is how that might work in practice, specifically it seems to me that it would not really resolve the issue but rather mitigate it a bit. What if two authors end up resolving conflicts in a different way, then two forks might end up evolving differently and on sync would end up with a same problem. In other words it seems like preserving both versions and letting user do a merge would work as long as participants sync up fairly often, which most likely would be the case, but if they don't, I don't think this would help.
I'll write some thought on how I think that might be solved in a separate comment.