Purpose

Store whole repositories in the same place instead of splitting them in several siva files. Reasons explained in: https://github.com/src-d/borges/issues/380

Changes

Add rooted repo column for the whole repository to the database schema
Skip init commit search if repository has already a rooted repository selected in DB
Select the rooted repository for the repository if it still doesn't have one

Database

In core-retrieval add a new column to Repository:

Init SHA1

We want to keep also the Init in Reference as these will be used to delete the references from the extra rooted repos on updating.

Init selection

If the repository already has Init column set use it instead of searching for one. Otherwise pick it following this rules:

Error when there are no references
If there's a default branch and is valid calculated the rooted repo from it
If there's no default branch calculate rooted repos from all branches and pick the most used, that is, the rooted repo with more references
If there is a tie pick the first lexicographically

Note: There could be more rules like getting the longest commit history tree or checking which ones already exist in the database but it will make the code more complex and this shouldn't happen too often.

Changes in the code

gitReferencer (https://github.com/src-d/borges/blob/master/git.go#L56) should have a new constructor to accept the init commit in case it exists in the database:

func NewGitReferencerWithInit(r *git.Repository, i plumbing.Hash) Referencer {
  return gitReferencer{
    Reposirory: r,
    init: i,
  }
}

type gitReferencer struct {
  *git.Repository

  init plumbing.Hash
}

If init is set then do not do the search and set all references Init to the same value.

Optimizations

These may not be done in the first implementation but could accelerate downloads a lot.

Fast path for first download

This is already done, here for completion. If the siva file is new (no commits) then rename the references and copy the repository as is inside the siva.

https://github.com/src-d/borges/pull/378

Fast path for updates

This only works if we already know the init where the repositoriy will be located.

A second optimization can use use a layer on top of the repository to do the translation of reference names when fetching it and do it directly over the siva file. This way the packfile that is downloaded is smaller and we don't need to do a push, it is written as is. This layer should be go-borges.

src-d / borges

Store repositories in a single siva file #381