Add support for Git submodules

martinvonz commented 2 years ago

Description

Submodules are used frequently enough that I think we'll just have to support them.

martinvonz commented 2 years ago

I have been thinking about this a bit. One problem is how to handle working-copy commits.

We could create a new working-copy commit in each submodule when you check out a commit in the superproject. However, there are at least two problems with that. One is that it would make all submodules seem dirty because they point to a new commit, unless we somehow skip an empty working-copy commit when we report to the superproject what the submodule's commit is. Another problem is that we would have to scan the whole tree for submodules. That's particularly bad if you have a monorepo with millions of directories to walk; we don't want to do that every time you check out a commit.

Note that this problem is similar to what the Git project wants to do when you check out a branch. I think their plan is to check out the same branch in each submodule. The problem is smaller there because they presumably only need to do that for submodules that are within the sparse checkout, so that limits the number of directories to walk.

Another option is to somehow lazily create the new working-copy commits in the submodules only if you modify any files. Maybe we would also want to do it if you run jj log inside of them? If we do, then we'd probably still want to ignore those commits when we report the commit to the superproject so it doesn't seem like the submodule has been updated just because you ran jj log inside it.

I'm happy to hear if anyone has any thoughts (@chooglen?).

chooglen commented 2 years ago

We could create a new working-copy commit in each submodule when you check out a commit in the superproject.

If we're thinking of the superproject and submodules as separate repos that can be interacted with mostly independent of each other (aka the Git model), this seems like the simplest, most intuitive behavior.

But in Git, that extra flexibility in the submodule ends up introducing a lot of unexpected behavior because it lets you meddle with the submodule without knowledge of the superproject and unintentionally break things, e.g. how would you prevent git -C submodule gc from GC-ing commits that the superproject needs? I'd be in favor of drastically limiting what you can do in a submodule in exchange for safety and simplicity.

One is that it would make all submodules seem dirty because they point to a new commit, unless we somehow skip an empty working-copy commit when we report to the superproject what the submodule's commit is.

If the superproject has a working-copy commit checked out, this doesn't seem like a problem to me since the new submodule commits won't leave the working-copy commit until a jj amend (or similar) in the superproject. I'm somewhat immune to churn in a working-copy commit because it's all meant to be temporary anyway.

I'm more worried about jj edit in the superproject + working copy commits in the submodule, since that makes it pretty easy to modify submodules in commits that aren't temporary.

martinvonz commented 1 year ago

Another option is to somehow lazily create the new working-copy commits in the submodules only if you modify any files.

I think I like this solution best. I'm thinking the submodule repos wouldn't have any attached working copies (they would be "bare" in Git-speak). Their working-copy files would be tracked in the parent project's working-copy state ("index" in Git-speak). That means that jj log in a submodule would not show where the working-copy is. It might be best to not even allow it. I think it's a little confusing if jj log shows different history if you're inside a submodule anyway.

How does that sound?

wez commented 1 year ago

FWIW, I run git log in a submodule to sanity check where I'm at and whether recent changes are present after a submodule update. In that mode, I'm interested in the repo/commit history rather than the history of individual files.

I'd be fine with jj log not showing the submodule history in the working copy (honestly, I think it is confusing for the history to change based on the cwd in git anyway!) but it would be great if there was a way to explicitly review the history for a submodule when needed, eg: jj log --submodule foo or whatever would fit in consistently with the jj UX.

chooglen commented 1 year ago

This topic has turned out to be quite big, so I intend to break this up into smaller bits that we can discuss separately. There are a lot of things we'll have to figure out, but assuming we can't make big changes to the way Git implements submodules, I think these are the big questions:

What is the mental model for submodules, and what does this mean for the UX?
What is the on-disk submodules format?
How will we snapshot changes in a submodule's content?
How do we represent and handle submodule conflicts in the superproject?
How will history rewriting the submodule be supported? Should we even support it?

If these make sense, I'll open those issues (discussions?) separately. Any other big questions I've missed?

martinvonz commented 1 year ago

If these make sense, I'll open those issues (discussions?) separately.

Sounds good. (I'd make them issues, but discussions would also work.)

What is the on-disk submodules format?

Do you mean for the working copy or the commit storage? Or both? I'm asking now only so you can decide if you want to create one or two separate issues for it (I'm fine with either).

samueltardieu commented 1 year ago

Sounds good. (I'd make them issues, but discussions would also work.)

I think that is what "Projects" is for: you can make a project which groups all relevant issues, and see whether they are done, in progress or not started.

chooglen commented 1 year ago

Do you mean for the working copy or the commit storage? Or both?

I think both, but since the two are linked (e.g. Git expects the .git file to point to the gitdir storing the commits), I'll open one discussion for that.

I think that is what "Projects" is for: you can make a project which groups all relevant issues, and see whether they are done, in progress or not started.

That sounds good. Though AFAICT GitHub Projects don't live on a repo (you create a project on your own user account and then link it to one or many repos), and administering both the project and the repo sounds like a pain... I'll fiddle around with it a bit, and if it makes sense, I'll ask @martinvonz to create the project so that at least the repo and project have the same owner.

martinvonz / jj

Add support for Git submodules #494

Description