Open jbenet opened 10 years ago
Wait wait wait, aren't Git submodules exactly that – tree entries pointing to commits?
rain ~/src/gnome/shell master $ git ls-tree @:src ... 100644 blob 03709d6051ea5affd0381ac86db40c28849ada2b gtkmenutrackeritem.h 160000 commit e14dbe8aa6dfaeea4a9f3405cf2f3e238e88623b gvc 040000 tree 6c316fb521870a39d902aceebf2c4c3e0982f77d hotplug-sniffer 100644 blob 4070482e18c9a4760cf33230036e9c3268fc5498 main.c ...
@grawity almost!! try:
git cat-file -p e14dbe8aa6dfaeea4a9f3405cf2f3e238e88623b
So, what's going on? The commit's not actually an object in the repo's object graph. It's somewhere else.
The submodule is stored as a commit
, yes but it doesnt quite work, because a submodule -- as git stands today -- is more then a commit. it's a commit hash + another repository (meaning, a repository url). this is why we need the extra .gitmodules
file, etc.
The submodule repo information is not stored as part of the commit because storing addresses (the repo url) in git objects doesn't make sense-- repos and locations are related but not the same thing (as they should be), because the repo changes location. Also note that the submodule's object repository lives within the .git of the submodule. It's a hack, because this is one of the things that .svn's model had sort of annoyingly right (right conceptually, but incredibly annoying because it littered .svn
everywhere). The result is that submodules are this weird halfway thing.
(If you've seen IPFS, you'll see where I'm going with all this. namely, merging all repos into one.)
Yes, Git makes an exception for submodules in that they still have independent object stores (and everything else), so git fsck
remains silent about a missing commit object.
(Also, recent SVN versions only have .svn
in the root directory of a checkout.)
(This is a realization i made while designing IPFS. Explaining it here because I'll need to refer to it. Excuse the meandering between too little and too much exposition. Not clear to me what I should and shouldn't assume the readers know).
Traditionally, Git commits point to trees (and other commits). Trees don't point to commits. This is a useful design as it keeps the commit tree somewhat separate from the file objects. Walking/manipulating the commit dag is simple.
This design choice becomes problematic when handling submodules, or the repositories themselves. Using submodules is notoriously annoying, because the model makes assumptions on the workflow (submodules are other things from some other space). Repos themselves aren't tracked within git. ("what!?" you say. "What does that even mean!?")
Repository objects
Imagine the Repository as a first-class git object, something like this:
Mapping ref names to commit hashes. A collection of entries, with an entry format like:
This is really just a more complicated
tree
object. Example tree:The tree is a collection of entries, with an entry format like:
(From now on, ignore the unix perms.) These are the same! So a
repo
object is really just a privilegedtree
object that gets to point to commits. That seems to me like a bad lack of generality. What if all trees could point to commits?You'd unlock a host of new workflow patterns within the object commit graph, and make submodules first-class things.
It's
turtlescommits all the way down.Takeaway: let trees point to commits.
[1] saying pointer here, and not url, because this can be a hash. Why is it a url? Because git is built to operate within a single machine, with a completely separate blobstore from all other git blobstores. If you change that -- if you make one blobstore to store them all -- then the url can be a hash to another object in the blobstore. (italics on hash because it's not a hash of the value of the object (not content-addressed). It has to be a symlink (mutable object).