seeraven / gitcache

Local cache for git repositories to speed up working with large repositories and multiple clones.
BSD 3-Clause "New" or "Revised" License
31 stars 6 forks source link

gitcache

Local cache for git repositories to speed up working with large repositories and multiple clones.

The basic idea of gitcache is to use a local bare mirror that is updated when needed and used as the source repository for multiple local repositories.

Features

Description

gitcache is designed to be used as a wrapper to git, so in the following we show how gitcache translates the git commands for the individual operations.

When the user issues a

git clone https://github.com/seeraven/gitcache.git

for the first time, the repository https://github.com/seeraven/gitcache.git is cloned into a bare mirror $GITCACHE_DIR/mirrors/github.com/seeraven/gitcache/git and then the git command is rewritten to

git clone $GITCACHE_DIR/mirrors/github.com/seeraven/gitcache/git gitcache

to create the clone. In addition, the push URL of the clone is adjusted to the upstream URL.

Whenever the user issues another git clone command of that repository, the mirror is updated (if the update strategy permits it) and the local clone is created as before.

Whenever the user performs a git pull or git fetch on that local clone, gitcache checks whether the repository is handled by gitcache (that is the pull URL is pointing to the mirror, the push URL is pointing to the upstream URL). If it is, it updates the mirror first (according to the update strategy) and executes the original command afterwards.

In addition to the git repositories, gitcache supports git-lfs as well and updates of the mirror include updates of the git-lfs part. You can configure gitcache to either use a global git-lfs storage directory or to use per mirror storage directories (the default).

All update operations on a mirror use a lock to ensure that only one modifies the mirror. This is crucial as simultaneous clones would easily lead to inconsistent behaviours and ugly race conditions.

Mirror Update Strategy

The mirror update strategy is controlled using the so called update interval. It gives the time between two updates of a mirror in seconds and allows you to save network bandwidth by avoiding multiple updates at almost the same time.

In addition, updates from the git pull and git fetch commands can be completely disabled by setting it to a negative value. This means that updates of the mirrors are only performed if explicitly requested by a git update-mirrors command. This can be useful on CI servers to control network usage even further.

Installation on Linux

gitcache is distributed as a single executable packaged using pyInstaller. So all you have to do is to download the latest executable and copy it to a location of your choice, for example ~/bin:

wget https://github.com/seeraven/gitcache/releases/download/v1.0.18/gitcache_v1.0.18_Ubuntu22.04_amd64
mv gitcache_v1.0.18_Ubuntu22.04_amd64 ~/bin/gitcache
chmod +x ~/bin/gitcache

gitcache can be used as a stand-alone command, but it is much easier to use it as a wrapper to git. All you have to do is to create a symlink and to adjust the PATH variable so that the wrapper is found before the real git command:

ln -s gitcache ~/bin/git
export PATH=$HOME/bin:$PATH

The export statement should be added to your ~/.bashrc file to set it permanently.

Installation on Windows

Download the latest executable for Windows from the release page https://github.com/seeraven/gitcache/releases. Rename the executable to gitcache.exe and put it into a directory in your PATH, e.g., into C:\Windows. Then create a symlink to git.exe by opening a console and executing:

cd C:\Windows
mklink git.exe gitcache.exe

Please note that the directory you are putting the symlink into should be stated before the real git command directory in your PATH variable!

Installation on MacOS

A single pyInstaller executable has a huge startup delay on MacOS, therefore gitcache is distributed as a tar-ball (*.tgz file). Download the archive and extract it at your desired target location (the archive contains a subfolder):

cd /my/target/destination
tar xfz gitcache_v1.0.18_Darwin_arm64.tgz
ls gitcache_v1.0.18_Darwin_arm64

To use the gitcache command, the final installation directory should be put into your PATH variable. To use it as a wrapper to the git command, you have to create the symlink and adjust the PATH variable so that the wrapper is found bfore the real git command as described on the installation on Linux section.

Configuration

gitcache stores all files under in the directory ~/.gitcache. This base directory can be changed by setting the GITCACHE_DIR environment variable. When the GITCACHE_DIR is created, the default configuration file GITCACHE_DIR/config is created and populated with the default values.

The current configuration can be shown by calling

gitcache

For every item, you'll see a corresponding environment variable that can be used to overwrite the setting of the configuration file.

The configuration options are:

Category Config Item Default Value Environment Variable
System realgit /usr/bin/git GITCACHE_REAL_GIT
MirrorHandling updateinterval 0 s GITCACHE_UPDATE_INTERVAL
MirrorHandling cleanupafter 14 days GITCACHE_CLEANUP_AFTER
Command checkinterval 2 s GITCACHE_COMMAND_CHECK_INTERVAL
Command locktimeout 1 h GITCACHE_COMMAND_LOCK_TIMEOUT
Command warniflockedfor 10 s GITCACHE_COMMAND_WARN_IF_LOCKED_FOR
GC commandtimeout 1 h GITCACHE_GC_COMMAND_TIMEOUT
GC outputtimeout 5 m GITCACHE_GC_OUTPUT_TIMEOUT
GC retries 3 GITCACHE_GC_RETRIES
LFS commandtimeout 1 h GITCACHE_LFS_COMMAND_TIMEOUT
LFS outputtimeout 5 m GITCACHE_LFS_OUTPUT_TIMEOUT
LFS permirrorstorage True GITCACHE_LFS_PER_MIRROR_STORAGE
LFS retries 3 GITCACHE_LFS_RETRIES
Clone commandtimeout 1 h GITCACHE_CLONE_COMMAND_TIMEOUT
Clone outputtimeout 5 m GITCACHE_CLONE_OUTPUT_TIMEOUT
Clone retries 3 GITCACHE_CLONE_RETRIES
Update commandtimeout 1 h GITCACHE_UPDATE_COMMAND_TIMEOUT
Update outputtimeout 5 m GITCACHE_UPDATE_OUTPUT_TIMEOUT
Update retries 3 GITCACHE_UPDATE_RETRIES
UrlPatterns includeregex .* GITCACHE_URLPATTERNS_INCLUDE_REGEX
UrlPatterns excluderegex (empty) GITCACHE_URLPATTERNS_EXCLUDE_REGEX

Configuration items that expect a time support the following values:

The following list gives a description of the configuration options:

gitcache Command Usage

The gitcache command provides the following options:

Without any options the gitcache command shows the current configuration.

When called as gitcache git ... it wraps the given git command as described in the next section.

Handled git Commands

The following git commands are handled specially. All other commands are forwarded to the real git command.

Debugging

For debugging, set the environment variable GITCACHE_LOGLEVEL to Debug:

GITCACHE_LOGLEVEL=Debug gitcache

Security Considerations

The main idea behind gitcache is to perform the caching of the git repositories only for the current user. This means that you should not share the mirrored git repositories with other users, as you do not know if another user would have the permission to access the remote repository.

Notes on Releases

Releases are now automatically built if a new tag v<major>.<minor>.<revision> is pushed to the repository. This changes the release process a little bit: