src-d / borges

borges collects and stores Git repositories.
https://docs.sourced.tech/borges/
GNU General Public License v3.0
52 stars 20 forks source link

Store repositories in a single siva file #381

Open jfontan opened 5 years ago

jfontan commented 5 years ago

Purpose

Store whole repositories in the same place instead of splitting them in several siva files. Reasons explained in: https://github.com/src-d/borges/issues/380

Changes

Database

In core-retrieval add a new column to Repository:

Init SHA1

We want to keep also the Init in Reference as these will be used to delete the references from the extra rooted repos on updating.

Init selection

If the repository already has Init column set use it instead of searching for one. Otherwise pick it following this rules:

Note: There could be more rules like getting the longest commit history tree or checking which ones already exist in the database but it will make the code more complex and this shouldn't happen too often.

Changes in the code

gitReferencer (https://github.com/src-d/borges/blob/master/git.go#L56) should have a new constructor to accept the init commit in case it exists in the database:

func NewGitReferencerWithInit(r *git.Repository, i plumbing.Hash) Referencer {
  return gitReferencer{
    Reposirory: r,
    init: i,
  }
}

type gitReferencer struct {
  *git.Repository

  init plumbing.Hash
}

If init is set then do not do the search and set all references Init to the same value.

Optimizations

These may not be done in the first implementation but could accelerate downloads a lot.

Fast path for first download

This is already done, here for completion. If the siva file is new (no commits) then rename the references and copy the repository as is inside the siva.

https://github.com/src-d/borges/pull/378

Fast path for updates

This only works if we already know the init where the repositoriy will be located.

A second optimization can use use a layer on top of the repository to do the translation of reference names when fetching it and do it directly over the siva file. This way the packfile that is downloaded is smaller and we don't need to do a push, it is written as is. This layer should be go-borges.

jfontan commented 5 years ago

Information on how the repositories are stored with the current system (one siva per rooted repo):