laughinghan / git-subhistory

Interchangeably merge in and split out subtree history
41 stars 6 forks source link

Mandatory commit maps, enabling squash-merge (meld with "subcommit" ideas) #8

Open laughinghan opened 7 years ago

laughinghan commented 7 years ago

I was thinking about the idea of caching a separate map between Main commits and Sub commits (#3), and I was thinking about how git push and git fetch can be used to share any ref under refs/ created by git update-ref (which is how sharing Git Notes works, for example) and how we might be able to use that to share the map between Main commits and Sub commits, and I might have stumbled on an idea to merge my subcommit ideas into this.

I've confirmed that refs can be trees or blobs, and that git push --force-with-lease=<refname>:<sha> works with them (other forms of --force-with-lease don't, though), and --force-with-lease has been in Git since 1.8.5, so I'm pretty sure this can work.


So my line of thinking went something like this. We wouldn't want git subhistory split/merge/push/pull/what-have-you to be crazy slow the first time after cloning a big repo, we would encourage/require people to push and fetch these maps. But if everyone's using these commit-to-commit maps, then there's no reason the underlying contents of the commits have to correspond as perfectly as subhistory is currently designed around.

Marking commits

In particular, squash-merging could totally work! For illustrative purposes, suppose that the 3rd commit on master to modify Sub comes before merging:

                                                                                                            [HEAD]
[initial commit]                                                                                            [master]
o--------------------------o--------------------------o--------------------------o--------------------------o
Add a Main thing           Add a Sub thing            Add 2nd Sub thing          Add 2nd Main thing         Add 3rd Sub thing
 _____________________      _____________________      _____________________      _____________________      _____________________
|                     |    |                     |    |                     |    |                     |    |                     |
|  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |
|  + a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |
|                     |    |  + path/to/sub/     |    |    path/to/sub/     |    |  + 2nd-Main-thing   |    |    2nd-Main-thing   |
|                     |    |  +   a-Sub-thing    |    |      a-Sub-thing    |    |    path/to/sub/     |    |    path/to/sub/     |
|                     |    |                     |    |  +   2nd-Sub-thing  |    |      a-Sub-thing    |    |      a-Sub-thing    |
|                     |    |                     |    |                     |    |      2nd-Sub-thing  |    |      2nd-Sub-thing  |
|                     |    |                     |    |                     |    |                     |    |  +   3rd-Sub-thing  |
|_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|

Say we're squash-merging sub-upstream/master into master. As with normal git-subhistory merge, we split the history of Sub in HEAD out as SPLIT_HEAD, but then instead of assimilating the SPLIT_HEAD..sub-upstream/master commits, we first merge sub-upstream/master directly into SPLIT_HEAD:

[initial commit]                                      [SPLIT_HEAD]                                          [sub-upstream/master]      [MERGE_HEAD]
o--------------------------o--------------------------o-----------------------------------------------------|--------------------------o
|                          |\-------------------------|--------------------------o--------------------------o-------------------------/|
Add a Sub thing            Add 2nd Sub thing          Add 3rd Sub thing          Fix Sub somehow            Fix Sub some more          Merge branch 'sub-upstream/master' into path/to/sub/ subhistory of master
 _____________________      _____________________      _____________________      _____________________      _____________________      _____________________
|                     |    |                     |    |                     |    |                     |    |                     |    |                     |
|  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |
|  + a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |
|                     |    |  + 2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |
|                     |    |                     |    |  + 3rd-Sub-thing    |    |  + fix-Sub          |    |    fix-Sub          |    |  < 3rd-Sub-thing    |
|                     |    |                     |    |                     |    |                     |    |  + fix-Sub-more     |    |  > fix-Sub          |
|                     |    |                     |    |                     |    |                     |    |                     |    |  > fix-Sub-more     |
|_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|

And we use that merged Sub tree in a new squash-merge commit on master:

                                                                                                                                       [HEAD]
[initial commit]                                                                                                                       [master]
o--------------------------o--------------------------o--------------------------o--------------------------o--------------------------o
Add a Main thing           Add a Sub thing            Add 2nd Sub thing          Add 2nd Main thing         Add 3rd Sub thing          Squash-merge subhistory branch 'sub-upstream/master' under path/to/sub/
 _____________________      _____________________      _____________________      _____________________      _____________________      _____________________
|                     |    |                     |    |                     |    |                     |    |                     |    |                     |
|  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |
|  + a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |
|                     |    |  + path/to/sub/     |    |    path/to/sub/     |    |  + 2nd-Main-thing   |    |    2nd-Main-thing   |    |    2nd-Main-thing   |
|                     |    |  +   a-Sub-thing    |    |      a-Sub-thing    |    |    path/to/sub/     |    |    path/to/sub/     |    |    path/to/sub/     |
|                     |    |                     |    |  +   2nd-Sub-thing  |    |      a-Sub-thing    |    |      a-Sub-thing    |    |      a-Sub-thing    |
|                     |    |                     |    |                     |    |      2nd-Sub-thing  |    |      2nd-Sub-thing  |    |      2nd-Sub-thing  |
|                     |    |                     |    |                     |    |                     |    |  +   3rd-Sub-thing  |    |      3rd-Sub-thing  |
|                     |    |                     |    |                     |    |                     |    |                     |    |  +   fix-Sub        |
|                     |    |                     |    |                     |    |                     |    |                     |    |  +   fix-Sub-more   |
|_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|

Note that as far as the rest of Git is concerned, the squash-merge commit is a normal, non-merge commit (with only one parent) that happens to make changes only in path/to/sub/. But to subhistory, it's a commit assimilated from a Sub commit, with an entry in the commit map from the squash-merge commit to the Sub merge commit.

This is important because the squash-merge commit needs to be split out as that Sub merge commit. Suppose, one last time, another (4th) commit on master modifies Sub:

                                                                                                                                                                  [HEAD]
[initial commit]                                                                                                                                                  [master]
o--------------------------o--------------------------o--------------------------o--------------------------o--------------------------o--------------------------o
Add a Main thing           Add a Sub thing            Add 2nd Sub thing          Add 2nd Main thing         Add 3rd Sub thing          Squash-merge subhistory... Add 4th Sub thing
 _____________________      _____________________      _____________________      _____________________      _____________________      _____________________      _____________________
|                     |    |                     |    |                     |    |                     |    |                     |    |                     |    |                     |
|  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |
|  + a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |    |    a-Main-thing     |
|                     |    |  + path/to/sub/     |    |    path/to/sub/     |    |  + 2nd-Main-thing   |    |    2nd-Main-thing   |    |    2nd-Main-thing   |    |    2nd-Main-thing   |
|                     |    |  +   a-Sub-thing    |    |      a-Sub-thing    |    |    path/to/sub/     |    |    path/to/sub/     |    |    path/to/sub/     |    |    path/to/sub/     |
|                     |    |                     |    |  +   2nd-Sub-thing  |    |      a-Sub-thing    |    |      a-Sub-thing    |    |      a-Sub-thing    |    |      a-Sub-thing    |
|                     |    |                     |    |                     |    |      2nd-Sub-thing  |    |      2nd-Sub-thing  |    |      2nd-Sub-thing  |    |      2nd-Sub-thing  |
|                     |    |                     |    |                     |    |                     |    |  +   3rd-Sub-thing  |    |      3rd-Sub-thing  |    |      3rd-Sub-thing  |
|                     |    |                     |    |                     |    |                     |    |                     |    |  +   fix-Sub        |    |  +   4th-Sub-thing  |
|                     |    |                     |    |                     |    |                     |    |                     |    |  +   fix-Sub-more   |    |      fix-Sub        |
|                     |    |                     |    |                     |    |                     |    |                     |    |                     |    |      fix-Sub-more   |
|_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|

And then we split that out to push upstream:

[initial commit]                                                                                            [sub-upstream/master]                                 [SPLIT_HEAD]
o--------------------------o--------------------------o-----------------------------------------------------|--------------------------o--------------------------o
|                          |\-------------------------|--------------------------o--------------------------o-------------------------/|
Add a Sub thing            Add 2nd Sub thing          Add 3rd Sub thing          Fix Sub somehow            Fix Sub some more          Merge branch 'sub-upstr... Add 4th Sub thing
 _____________________      _____________________      _____________________      _____________________      _____________________      _____________________      _____________________
|                     |    |                     |    |                     |    |                     |    |                     |    |                     |    |                     |
|  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |    |  Files:             |
|  + a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |    |    a-Sub-thing      |
|                     |    |  + 2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |    |    2nd-Sub-thing    |
|                     |    |                     |    |  + 3rd-Sub-thing    |    |  + fix-Sub          |    |    fix-Sub          |    |  < 3rd-Sub-thing    |    |    3rd-Sub-thing    |
|                     |    |                     |    |                     |    |                     |    |  + fix-Sub-more     |    |  > fix-Sub          |    |  + 4th-Sub-thing    |
|                     |    |                     |    |                     |    |                     |    |                     |    |  > fix-Sub-more     |    |    fix-Sub          |
|                     |    |                     |    |                     |    |                     |    |                     |    |                     |    |    fix-Sub-more     |
|_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|    |_____________________|

It's important that this split-out commit be a fast-forward from the Merge branch 'sub-upstream/master' into path/to/sub/ subhistory of master commit, so that it will be a fast-forward from sub-upstream/master; if upstream has further updates, Fix Sub some more will be the merge base. If instead this split-out commit weren't this non-squash merge commit, if instead the Fix Sub some{how, more} commits were squash-merged into Sub's history, then Fix Sub some more won't be the merge base and could well conflict.

One complication is that Git is a distributed systems problem: what if someone else pulls down the squash-merge commit, makes more changes to Sub on top, and then split it out? As noted above, it's critical that the squash-merge commit be split out as the underlying non-squash Sub merge commit so that the merge base with upstream will be the right one. How do we enforce that the commit map is up-to-date at split time? Ideas:

So, the squash-merge commit object itself must somehow be marked with the Sub merge commit to tell us to download and use it, the commit map alone is insufficient.

This should work for empty commits (#6), too. Open question: should we do this for every assimilated commit, then, not just squash-merge commits and assimilated empty commits? (If we do it for all assimilated commits, we wouldn't even need that direction of cache map, right? And takes care of transformed commit messages.)

Invariants

Another natural question is whether we should symmetrically be marking split-out commits too, but I think the answer to that is a definitive no. They're fundamentally asymmetrical: a given commit of Main has some fixed number of subprojects in it, whereas a given commit of Sub could be assimilated into any number of superprojects in the future. It would be weird for a superproject assimilating a Sub commit to have information on the hash of a commit in some other unrelated superproject (in the marking of the split-out commit).

And how would it be useful? Having the split-out commit have a transformed commit message (subcomponent prefix removed, for example)? So, what, next time we split out the Main commit C, we check to see if there's already a split-out Sub commit C' with a marking pointing back to C? Remember, distributed systems problem: what if someone else downloads C but hasn't downloaded C', when they split out C will they get a different hash from C'?

This is a fundamental thing that subhistory needs to satisfy, which leads to a fundamental invariant:

Note that the current guarantee is stronger than this, where there's a unique Sub commit that we're able to actually create from the commit object alone. This proposal weakens that guarantee: we may have to download a ref to the Sub commit, because a squash-merge commit just doesn't have enough information. But we know from the commit object alone (due to the marking) that we need to download that ref.

Problems:

laughinghan commented 7 years ago

I was editing that over the course of many days, it's time to push it even though I have further thoughts.

laughinghan commented 7 years ago

if two people modify commit map, how to sync? refspecs can only force-push/pull

My first instinct is for a post-fetch hook to merge the remote commit map with the local one. Unfortunately, [there's no post-fetch hook], but inspired by that link I realized that merging commit maps doesn't actually need to happen immediately after fetch as long as it happens before the next push, or before the next split with a fetched commit. It's trivial for split to merge in the remote commit map as a first (zeroth?) step, and if there's no split before the next push, there's a pre-push hook (which unfortunately does get skipped if you do git push --no-verify, but it's the best we can do).

Just checked and the pre-push script can fetch and push no problem (was worried that before pre-push was invoked, git-push acquired a lock on pushing or something). So it could totally work to, for example,

  1. Based on objects being pushed to remote passed to pre-push, determine which subhistory objects need to be pushed (e.g. if a squash-merge commit is being pushed, identify the Sub commits)
  2. Merge local commit map with last fetched remote commit map
  3. git push --force-with-lease commit map and refs to Sub commits
  4. if that failed, fetch remote commit map and go back to step 2 (error if we've hit this loop too many times)
laughinghan commented 7 years ago

Further notes:

Darthholi commented 7 years ago

Wow, man, I did a first read for this evening and I have to say, that including info in custom header seems to me like a really clever thought! Also because my philosophy is "subhistory is fine, if the user is doing 'nonstandart' thing, then calling full split again is the price for it", so maybe there is no need to worry about everything (for example cherrypicking can be in practice used for supporting LTS versions...)

Also there did arise a question - does it work also with the option to not do squashing?

So this is for the first thoughts - I will read it again and have comments :)

laughinghan commented 7 years ago

does it work also with the option to not do squashing?

Yes! It might even be easier if I preclude squashing (although maybe not, since the problem of determining which Sub commits to keep refs to seems more or less the same as the problem of determining which commits to keep in the commit map)

laughinghan commented 7 years ago

Wow, just noticed #7, I hadn't thought about signatures at all but that suggests another transformation of commits when assimilating that it is important to undo when splitting out assimilated commits: stripping signatures. (That is, assimilating commits strips signatures; when splitting them out later, it's important to map back to the signed Sub commits.)

So that's a point in favor of adding the custom header or special commit message line to every assimilated commit, or at least certainly every signed one.

Note that if we keep around the commit map and signatures, we still don't need to keep around refs to the actual commits, that's enough information to recreate the Sub commit objects.

Darthholi commented 7 years ago

Cool, thanks for answer!

So, for 'custom header' option - is there a way then to see if: a) Everything is fine, we can use shortcuts (as maps I think you refer to them) or b) There is something messed up (cherry pick,rebasing, amending,...) and the price is then to split it again - to compensate for nonstandart behavior...?

My PR does that by simply comparing references and their histories, but for this clever strategy, how would the mechanism work?