Open laughinghan opened 7 years ago
I was editing that over the course of many days, it's time to push it even though I have further thoughts.
if two people modify commit map, how to sync? refspecs can only force-push/pull
My first instinct is for a post-fetch
hook to merge the remote commit map with the local one. Unfortunately, [there's no post-fetch
hook], but inspired by that link I realized that merging commit maps doesn't actually need to happen immediately after fetch as long as it happens before the next push, or before the next split with a fetched commit. It's trivial for split
to merge in the remote commit map as a first (zeroth?) step, and if there's no split before the next push, there's a pre-push
hook (which unfortunately does get skipped if you do git push --no-verify
, but it's the best we can do).
Just checked and the pre-push
script can fetch
and push
no problem (was worried that before pre-push
was invoked, git-push
acquired a lock on pushing or something). So it could totally work to, for example,
pre-push
, determine which subhistory objects need to be pushed (e.g. if a squash-merge commit is being pushed, identify the Sub commits)git push --force-with-lease
commit map and refs to Sub commitsFurther notes:
post-rewrite
hook exists that can help deal with rebasing assimilated or squash-merge commitspre-auto-gc
hook does exist that could go and delete refs to subcommits that are otherwise unreachable (how? Would it have to sweep all branches and tags? That would be unfortunate. Also concurrency safety would be another concern, if it's updating some kind of thing with the tips of the subhistory of every branch or something. Bonus of such a thing is its reflog would reference (and hence render reachable) the subcommits of any Main commits that are unreachable from the main branches except via reflog)Wow, man, I did a first read for this evening and I have to say, that including info in custom header seems to me like a really clever thought! Also because my philosophy is "subhistory is fine, if the user is doing 'nonstandart' thing, then calling full split again is the price for it", so maybe there is no need to worry about everything (for example cherrypicking can be in practice used for supporting LTS versions...)
Also there did arise a question - does it work also with the option to not do squashing?
So this is for the first thoughts - I will read it again and have comments :)
does it work also with the option to not do squashing?
Yes! It might even be easier if I preclude squashing (although maybe not, since the problem of determining which Sub commits to keep refs to seems more or less the same as the problem of determining which commits to keep in the commit map)
Wow, just noticed #7, I hadn't thought about signatures at all but that suggests another transformation of commits when assimilating that it is important to undo when splitting out assimilated commits: stripping signatures. (That is, assimilating commits strips signatures; when splitting them out later, it's important to map back to the signed Sub commits.)
So that's a point in favor of adding the custom header or special commit message line to every assimilated commit, or at least certainly every signed one.
Note that if we keep around the commit map and signatures, we still don't need to keep around refs to the actual commits, that's enough information to recreate the Sub commit objects.
Cool, thanks for answer!
So, for 'custom header' option - is there a way then to see if: a) Everything is fine, we can use shortcuts (as maps I think you refer to them) or b) There is something messed up (cherry pick,rebasing, amending,...) and the price is then to split it again - to compensate for nonstandart behavior...?
My PR does that by simply comparing references and their histories, but for this clever strategy, how would the mechanism work?
I was thinking about the idea of caching a separate map between Main commits and Sub commits (#3), and I was thinking about how
git push
andgit fetch
can be used to share any ref underrefs/
created bygit update-ref
(which is how sharing Git Notes works, for example) and how we might be able to use that to share the map between Main commits and Sub commits, and I might have stumbled on an idea to merge mysubcommit
ideas into this.I've confirmed that refs can be trees or blobs, and that
git push --force-with-lease=<refname>:<sha>
works with them (other forms of--force-with-lease
don't, though), and--force-with-lease
has been in Git since 1.8.5, so I'm pretty sure this can work.So my line of thinking went something like this. We wouldn't want
git subhistory split/merge/push/pull/what-have-you
to be crazy slow the first time after cloning a big repo, we would encourage/require people to push and fetch these maps. But if everyone's using these commit-to-commit maps, then there's no reason the underlying contents of the commits have to correspond as perfectly assubhistory
is currently designed around.Marking commits
In particular, squash-merging could totally work! For illustrative purposes, suppose that the 3rd commit on
master
to modify Sub comes before merging:Say we're squash-merging
sub-upstream/master
intomaster
. As with normalgit-subhistory merge
, we split the history of Sub inHEAD
out asSPLIT_HEAD
, but then instead of assimilating theSPLIT_HEAD..sub-upstream/master
commits, we first mergesub-upstream/master
directly intoSPLIT_HEAD
:And we use that merged Sub tree in a new squash-merge commit on
master
:Note that as far as the rest of Git is concerned, the squash-merge commit is a normal, non-merge commit (with only one parent) that happens to make changes only in
path/to/sub/
. But tosubhistory
, it's a commit assimilated from a Sub commit, with an entry in the commit map from the squash-merge commit to the Sub merge commit.This is important because the squash-merge commit needs to be split out as that Sub merge commit. Suppose, one last time, another (4th) commit on
master
modifies Sub:And then we split that out to push upstream:
It's important that this split-out commit be a fast-forward from the
Merge branch 'sub-upstream/master' into path/to/sub/ subhistory of master
commit, so that it will be a fast-forward fromsub-upstream/master
; if upstream has further updates,Fix Sub some more
will be the merge base. If instead this split-out commit weren't this non-squash merge commit, if instead theFix Sub some{how, more}
commits were squash-merged into Sub's history, thenFix Sub some more
won't be the merge base and could well conflict.One complication is that Git is a distributed systems problem: what if someone else pulls down the squash-merge commit, makes more changes to Sub on top, and then split it out? As noted above, it's critical that the squash-merge commit be split out as the underlying non-squash Sub merge commit so that the merge base with upstream will be the right one. How do we enforce that the commit map is up-to-date at split time? Ideas:
origin
? What if they didn't use a remote, just passed a Git URL directly togit pull
or something? Too many ways to pull in commits without using refspec in config for this to work.path/to/sub/.gitsubhistory/assimilated-from
or something, to tell us to download and use the subproject commit. Problem: subsequent normal commits will have the same file with the same contents unless the user manually changes this file.Merge branch blah blah...
commit message format is really verbose and it's likely there are people who prefer to customize those to be more readable; GitHub overrides that, for example. It would be less bad if the requirement is merely "last line must be of the formAssimilated from da39a3ee5e6b4b0d3255bfef95601890afd80709.
" or something, but still, it feels like if the user manually edits the commit message and like, misspells "Assimilate" or something, that shouldn't breaksubhistory
, that would be stupid.Add a custom header to assimilated commits, like:
Supposedly "since we introduced the "encoding" header a while back clients have learned to ignore unknown headers", so this shouldn't break anything. However, I did some testing, and the unknown header does get thrown away by
git cherry-pick
always, and bygit rebase -i
if any earlier commit is changed (probably because it runsgit commit-tree
which generates the commit from scratch).git commit --amend
does preserve the header though, as doesgit rebase -i
if nothing earlier changes (i.e., if the parent is the same hash). Also, this would obviously be more annoying to generate and parse than the commit message.So, the squash-merge commit object itself must somehow be marked with the Sub merge commit to tell us to download and use it, the commit map alone is insufficient.
This should work for empty commits (#6), too. Open question: should we do this for every assimilated commit, then, not just squash-merge commits and assimilated empty commits? (If we do it for all assimilated commits, we wouldn't even need that direction of cache map, right? And takes care of transformed commit messages.)
Invariants
Another natural question is whether we should symmetrically be marking split-out commits too, but I think the answer to that is a definitive no. They're fundamentally asymmetrical: a given commit of Main has some fixed number of subprojects in it, whereas a given commit of Sub could be assimilated into any number of superprojects in the future. It would be weird for a superproject assimilating a Sub commit to have information on the hash of a commit in some other unrelated superproject (in the marking of the split-out commit).
And how would it be useful? Having the split-out commit have a transformed commit message (subcomponent prefix removed, for example)? So, what, next time we split out the Main commit C, we check to see if there's already a split-out Sub commit C' with a marking pointing back to C? Remember, distributed systems problem: what if someone else downloads C but hasn't downloaded C', when they split out C will they get a different hash from C'?
This is a fundamental thing that
subhistory
needs to satisfy, which leads to a fundamental invariant:Note that the current guarantee is stronger than this, where there's a unique Sub commit that we're able to actually create from the commit object alone. This proposal weakens that guarantee: we may have to download a ref to the Sub commit, because a squash-merge commit just doesn't have enough information. But we know from the commit object alone (due to the marking) that we need to download that ref.
Problems:
pre-push
hook that doesdisown
(to daemonize) and then pushes refs to subcommits?