martinvonz / jj

A Git-compatible VCS that is both simple and powerful
https://martinvonz.github.io/jj/
Apache License 2.0
8.24k stars 276 forks source link

[submodules milestone] Submodules support merging/rebasing/conflicts #1405

Open chooglen opened 1 year ago

chooglen commented 1 year ago

For Git submodules, we would like merging, rebasing and conflicts in a jj-idiomatic way. Fortunately, Git does not support merging/rebasing submodules at all [1], so we have a lot of leeway to define something that makes sense for us without worrying about not being Git-idiomatic

  1. git submodule update supports the --merge and --rebase flags, which is sort of a submodule merge/rebase, since it performs the operation independently for each submodule. It's hard to call this a real merge/rebase, since this doesn't consider the superproject (e.g. the merge base is calculated using the submodule history instead of the superproject history).
chooglen commented 1 year ago

One idea that @martinvonz and I discussed is to present conflicts in terms of what jj would see if it tried to merge submodules the same way Git does.

Background

In Git, a submodule is stored in the tree as a gitlink type, and the content is the submodule commit. During a merge, Git does not merge the contents of the commits; if the left and right sides disagree, there will always be a conflict (which is similar to treating gitlinks like one-line files).

In some cases, this is a lost opportunity because there is an 'obvious' intended result. e.g. in this case, where a superproject and submodule have exactly the same branching structure:

image

It seems reasonable that the merge of B and C in the superproject should point to the merge of Y and Z in the submodule. But this is not true generally, since the superproject can point to any submodule commit, e.g.

image

In such cases, it is not clear what the desired resolution should be, and it may be the case the user might want different resolutions in different circumstances. Possibilities include:

Proposal

jj represents the conflict in the exact same way as Git: as conflicts in the gitlink entry. In the first figure, this would be Y + Z - X, in the second it would be X + Y - Z.

Crucially, even if we don't know where we should be in submodule history, the contents of the working copy can still be well-defined as the merge between the relevant trees. In the case where the user wants to actually merge the contents of the superproject and submodules, this is a sensible result.

To resolve the conflict in history, we could ask the user to decide to create a new commit with the contents or to use an existing commit. When creating a new commit, we can peek at the conflicted submodules to determine if there is an equivalent merge/rebase that would result in the same content (e.g. in the first figure, we could auto-detect that the content is the same as the merge of Y and Z). If so, we could provide a shortcut that would create that equivalent commit.

hanwen commented 1 year ago

alternate idea:

In JJ the result of a merge is a commit as well (possibly with a conflict). If submodules are full JJ repos, then this means you could simply stick the submodule merge as a commit in the superproject. In your 2nd example here that means creating the merge of X and Y (which IIUC is simply Y)

I can imagine this is hard to make work, becuase the "has conflict" bit is probably not stored in the Git commit?

chooglen commented 1 year ago

If submodules are full JJ repos, then this means you could simply stick the submodule merge as a commit in the superproject.

To be clear, you are suggesting that we would do the merge (possibly creating a conflict) in the submodule, and then set that as the result?

I think that works well in some, but not all cases. E.g. if the merge base of the superproject commits is not the same as the merge base of the submodule commits, which one should we use?

In my proposed scheme, we would basically give up and ask the user what they really want (which might be cumbersome in practice). Though I can imagine an enhancement where, if the merge bases are the same, we just automatically do the merge.

I can imagine this is hard to make work, becuase the "has conflict" bit is probably not stored in the Git commit?

I suspect it might not be so hard, because we already "extend" the Git commit format to store conflicts (They are stored as the conflicted path suffixed with .jjconflict). This is (of course) nonsense to Git, so we refuse to push it, but you can see this in a colocated repo and you run git show <conflicted-commit>.

martinvonz commented 1 year ago

In your 2nd example here that means creating the merge of X and Y (which IIUC is simply Y)

In the second example, when merging B and C, the usual 3-way merge would be:

The most natural thing would be to record that as a conflict in the superproject's tree. That's already supported by the data model: a Conflict has positive and negative ConflictTerms, each of which has a TreeValue, and the TreeValue has a GitSubmodule variant.

As you both said, IIUC, we can calculate the resolved tree state, possibly including content-level conflicts, and reflect that in the working copy if we want to, but we'll have to ask to user to resolve the commit-level conflict. However, as Glen said, we may want to automate that conflict resolution in simple cases, like when merging B and C in the first case, but probably not when rebasing B onto C or vice versa.

hanwen commented 1 year ago

I suspect it might not be so hard, because we already "extend" the Git commit format to store conflicts (They are stored as the conflicted path suffixed with .jjconflict).

I saw that, but assumed there was more to it. When you run jj log, it prints whether the commit have conflicts or not. If you have walk the tree to discover if there is a conflict, that would get expensive if the tree is large. But maybe you cache whether a tree contains a .jjconflict file?

martinvonz commented 1 year ago

Yes, that's exactly how it works :) And yes, it is very slow on large repos, and we plan to cache that information (probably per tree, as you said). FYI, we also plan to cache the auto-merged parents of merge commits, because diffing and rebasing merge commits is currently slower than it should be.