src-d / gitbase

SQL interface to git repositories, written in Go. https://docs.sourced.tech/gitbase
Apache License 2.0
2.07k stars 123 forks source link

Check how feasible is to add a generation column to ref_commits table #928

Closed ajnavarro closed 5 years ago

ajnavarro commented 5 years ago

Right now, ref_commits has the following schema:

+---------------+---------+
| name          | type    |
+---------------+---------+
| repository_id | TEXT    |
| commit_hash   | VARCHAR |
| ref_name      | TEXT    |
| history_index | BIGINT  |
+---------------+---------+

Check how feasible is to add a generation column that basically is giving the position of the commit in relation to the root commit, as graph index is doing:

https://github.com/git/git/blob/master/Documentation/technical/commit-graph-format.txt

The idea is to use that generation value to make possible get the new commits from previous queried data.

erizocosmico commented 5 years ago

Because of the way the commit graph needs to be traversed, we cannot possibly calculate the generation of a commit lazily as we compute the ref_name table rows, since we need to go all the way to the roots and back to calculate the generation.

That leaves is with no option but to use the commit graph files in .git/objects/info to read this information. This, however, has another issue: the commit graph file may not be there if it has not been generated.

We would need to make sure every repository that's added to the repository pool has a commit graph file during initialisation of gitbase and generate them for repositories where they're not present. However, this may be tricky with siva files, since it implies we have to actually write data on the siva file (the commit graph file). This is a very important consideration we need to have into account, because to add this feature we need to ensure all possible git repositories we can add have the capability of generating the commit graph file.

Another very important consideration we need to take into account is the fact that repositories may change and commit graph file may become outdated. I'm not sure git updates that file (haven't been able to find it in the docs), but go-git and potentially other git clients may not update the file once the repository changes. Generating a new commit graph file each time gitbase is started may be fine?

Assuming there is no problem in a commit graph file accessible and we are able to provide this data, there is nothing very difficult here. Rows are generated per-partition, so we load the commit graph when the first row of the partition is requested and dispose it once we're finished.

Detailed required changes

go-git

gitbase

Caveats

erizocosmico commented 5 years ago

@ajnavarro Shall I close this issue?