Add rooted repo column for the whole repository to the database schema
Skip init commit search if repository has already a rooted repository selected in DB
Select the rooted repository for the repository if it still doesn't have one
Database
In core-retrieval add a new column to Repository:
Init SHA1
We want to keep also the Init in Reference as these will be used to delete the references from the extra rooted repos on updating.
Init selection
If the repository already has Init column set use it instead of searching for one. Otherwise pick it following this rules:
Error when there are no references
If there's a default branch and is valid calculated the rooted repo from it
If there's no default branch calculate rooted repos from all branches and pick the most used, that is, the rooted repo with more references
If there is a tie pick the first lexicographically
Note: There could be more rules like getting the longest commit history tree or checking which ones already exist in the database but it will make the code more complex and this shouldn't happen too often.
func NewGitReferencerWithInit(r *git.Repository, i plumbing.Hash) Referencer {
return gitReferencer{
Reposirory: r,
init: i,
}
}
type gitReferencer struct {
*git.Repository
init plumbing.Hash
}
If init is set then do not do the search and set all references Init to the same value.
Optimizations
These may not be done in the first implementation but could accelerate downloads a lot.
Fast path for first download
This is already done, here for completion. If the siva file is new (no commits) then rename the references and copy the repository as is inside the siva.
This only works if we already know the init where the repositoriy will be located.
A second optimization can use use a layer on top of the repository to do the translation of reference names when fetching it and do it directly over the siva file. This way the packfile that is downloaded is smaller and we don't need to do a push, it is written as is. This layer should be go-borges.
Purpose
Store whole repositories in the same place instead of splitting them in several siva files. Reasons explained in: https://github.com/src-d/borges/issues/380
Changes
Database
In
core-retrieval
add a new column toRepository
:Init SHA1
We want to keep also the
Init
inReference
as these will be used to delete the references from the extra rooted repos on updating.Init selection
If the repository already has
Init
column set use it instead of searching for one. Otherwise pick it following this rules:Note: There could be more rules like getting the longest commit history tree or checking which ones already exist in the database but it will make the code more complex and this shouldn't happen too often.
Changes in the code
gitReferencer
(https://github.com/src-d/borges/blob/master/git.go#L56) should have a new constructor to accept the init commit in case it exists in the database:If
init
is set then do not do the search and set all referencesInit
to the same value.Optimizations
These may not be done in the first implementation but could accelerate downloads a lot.
Fast path for first download
This is already done, here for completion. If the siva file is new (no commits) then rename the references and copy the repository as is inside the siva.
https://github.com/src-d/borges/pull/378
Fast path for updates
This only works if we already know the init where the repositoriy will be located.
A second optimization can use use a layer on top of the repository to do the translation of reference names when fetching it and do it directly over the siva file. This way the packfile that is downloaded is smaller and we don't need to do a push, it is written as is. This layer should be
go-borges
.