seeraven / gitcache

Local cache for git repositories to speed up working with large repositories and multiple clones.
BSD 3-Clause "New" or "Revised" License
40 stars 8 forks source link

Slow start up #37

Closed amontalban closed 6 months ago

amontalban commented 6 months ago

Hi,

I have been using/trying gitcache for a couple of days already and I have noticed that in some cases is slower that plain git.

I'm trying to use gitcache with Terragrunt/Atlantis to avoid cloning the same repo hundreds of times, but since Terragrunt uses git rev-parse --show-toplevel a lot it is being slower than plain git.

The start up time before showing any debug is considerable (In the order of 3-4 seconds) for an already cloned repo:

bash-5.2$ export GITCACHE_UPDATE_INTERVAL=10d
bash-5.2$ time GITCACHE_LOGLEVEL=Debug /usr/local/bin/git clone git@github.com:seeraven/gitcache.git
2024-05-10 10:05:43 Python executable: /usr/local/bin/git
2024-05-10 10:05:43 Called as ['/usr/local/bin/git', 'clone', 'git@github.com:seeraven/gitcache.git']
2024-05-10 10:05:43 handle_git_command(['/usr/local/bin/git'], ['clone', 'git@github.com:seeraven/gitcache.git']) started
2024-05-10 10:05:43 Found global options [], command 'clone', command options '[]' and arguments ['git@github.com:seeraven/gitcache.git'].
2024-05-10 10:05:43 Loading configuration file /Users/andresmontalban/.gitcache/config.
2024-05-10 10:05:43 Loading configuration file /Users/andresmontalban/.gitcache/config.
2024-05-10 10:05:43 Can't load configuration file /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/gitcache.config as it does not exist!
2024-05-10 10:05:43 Update time of mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache not reached yet.
2024-05-10 10:05:43 Starting Clone from mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache.
2024-05-10 10:05:43 Retry to execute command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git gitcache' up to 3 times.
2024-05-10 10:05:43 Execute command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git gitcache' (shell=False, cwd=None) with command timeout of 3600 seconds and output timeout of 300 seconds. stderr_capture=True
Cloning into 'gitcache'...
done.
2024-05-10 10:05:43 Command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git gitcache' finished with return code 0.
2024-05-10 10:05:43 Clone from mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache was successfully completed within 0.1 seconds.
2024-05-10 10:05:43 Setting push URL to git@github.com:seeraven/gitcache.git and configure LFS.

real    0m5.843s
user    0m0.295s
sys 0m0.168s

If I do not setup GITCACHE_LOGLEVEL=Debug it takes longer because it has to update the clone.

On the other hand if I use plain git I get faster clone:

bash-5.2$ time git clone git@github.com:seeraven/gitcache.git
Cloning into 'gitcache'...
remote: Enumerating objects: 1122, done.
remote: Counting objects: 100% (150/150), done.
remote: Compressing objects: 100% (59/59), done.
remote: Total 1122 (delta 101), reused 114 (delta 84), pack-reused 972
Receiving objects: 100% (1122/1122), 291.03 KiB | 483.00 KiB/s, done.
Resolving deltas: 100% (729/729), done.

real    0m3.956s
user    0m0.160s
sys 0m0.088s

Also simple commands like git rev-parse --show-toplevel are slow to execute:

bash-5.2$ time /usr/local/bin/git rev-parse --show-toplevel
/Users/andresmontalban/code/gitcache

real    0m5.742s
user    0m0.278s
sys 0m0.131s
bash-5.2$ time git rev-parse --show-toplevel
/Users/andresmontalban/code/gitcache

real    0m0.045s
user    0m0.007s
sys 0m0.013s

For these tests I'm running gitcache on a Macbook Pro with M1 Pro CPU, but I'm getting similar results in an AWS instance with 16 cores.

Any idea on how to make it faster?

Thank you very in advance for your help!

seeraven commented 6 months ago

Hello @amontalban ,

that are really huge startup times of gitcache! I repeated your steps on my Linux machine running Ubuntu, and I see only 0.220s (real).

Since the rev-parse commands are actually only forwarded by gitcache, it is pretty clear that the delay is introduced by the startup time of the binary until it starts interpreting the python code.

The binaries are all created by using pyInstaller which extracts its files into a temporary directory on startup before starting interpreting it. This is done on every call and seems to take more than 5 seconds on your machine. I have found a question on stackoverflow which suggests using a different option, but this will probably also clutter the workspace (I have to check that in more detail).

Apparently also pyInstaller does not recommend the current distribution method (--onefile) on MacOS due to performance issues. Apparently the OS might check the temporary files repeatedly and I guess that is what we see here. However, I'm still surprised how large this delay is.

I will check how the proposed --onedir distribution works and try to generate the archives for that distribution method.

On your AWS machine, do you run also MacOS on it or is it a different OS?

seeraven commented 6 months ago

Hello @amontalban ,

I've created a new release https://github.com/seeraven/gitcache/releases/tag/v1.0.16 that should overcome the startup delay problem. Now the MacOS release is created using the --onedir option which means that the script including all dependencies is a directory rather than a single file. That means, that the installation has slightly changed, as the directory contents (consisting of the gitcache command and the _internal subdirectory) should not be separated. Best way is probably to put it somewhere and adjust the PATH variable accordingly, so that gitcache (and perhaps the git -> gitcache symlink) is found.

It would be great if you can test it and give feedback on how it works, as I have no Mac to actually test it. ;-)

amontalban commented 6 months ago

Hi @seeraven,

Thanks for the quick fix, I have tested in my macOS and it is now performing way better. Regarding my tests in a Linux box I think the problem was something not related to gitcache, so sorry for the noise.

Cold clone

(⎈|N/A:default) ~/test  time GITCACHE_LOGLEVEL=Debug ./gitcache_v1.0.16_Darwin_arm64/gitcache git clone git@github.com:seeraven/gitcache.git test_gitcache_cold
2024-05-14 10:14:48 Python executable: /Users/andresmontalban/test/gitcache_v1.0.16_Darwin_arm64/gitcache
2024-05-14 10:14:48 Called as ['./gitcache_v1.0.16_Darwin_arm64/gitcache', 'git', 'clone', 'git@github.com:seeraven/gitcache.git', 'test_gitcache_cold']
2024-05-14 10:14:48 handle_git_command(['./gitcache_v1.0.16_Darwin_arm64/gitcache', 'git'], ['clone', 'git@github.com:seeraven/gitcache.git', 'test_gitcache_cold']) started
2024-05-14 10:14:48 Found global options [], command 'clone', command options '[]' and arguments ['git@github.com:seeraven/gitcache.git', 'test_gitcache_cold'].
2024-05-14 10:14:48 Can't load configuration file /Users/andresmontalban/.gitcache/config as it does not exist!
2024-05-14 10:14:48 Save configuration file /Users/andresmontalban/.gitcache/config.
2024-05-14 10:14:48 Loading configuration file /Users/andresmontalban/.gitcache/config.
2024-05-14 10:14:48 Can't load configuration file /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/gitcache.config as it does not exist!
2024-05-14 10:14:48 Starting Initial clone of git@github.com:seeraven/gitcache.git into /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache.
2024-05-14 10:14:48 Retry to execute command '/usr/bin/git clone --progress --mirror git@github.com:seeraven/gitcache.git /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git' up to 3 times.
2024-05-14 10:14:48 Execute command '/usr/bin/git clone --progress --mirror git@github.com:seeraven/gitcache.git /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git' (shell=False, cwd=None) with command timeout of 3600 seconds and output timeout of 300 seconds. stderr_capture=True
Cloning into bare repository '/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git'...
remote: Enumerating objects: 1161, done.
remote: Counting objects: 100% (182/182), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 1161 (delta 124), reused 137 (delta 97), pack-reused 979
Receiving objects: 100% (1161/1161), 298.29 KiB | 263.00 KiB/s, done.
Resolving deltas: 100% (758/758), done.
2024-05-14 10:14:53 Command '/usr/bin/git clone --progress --mirror git@github.com:seeraven/gitcache.git /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git' finished with return code 0.
2024-05-14 10:14:53 Initial clone of git@github.com:seeraven/gitcache.git into /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache was successfully completed within 5.4 seconds.
2024-05-14 10:14:53 LFS fetch skipped as git-lfs is not available on this system!
2024-05-14 10:14:53 Starting Clone from mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache.
2024-05-14 10:14:53 Retry to execute command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git test_gitcache_cold' up to 3 times.
2024-05-14 10:14:53 Execute command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git test_gitcache_cold' (shell=False, cwd=None) with command timeout of 3600 seconds and output timeout of 300 seconds. stderr_capture=True
Cloning into 'test_gitcache_cold'...
done.
2024-05-14 10:14:54 Command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git test_gitcache_cold' finished with return code 0.
2024-05-14 10:14:54 Clone from mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache was successfully completed within 0.1 seconds.
2024-05-14 10:14:54 Setting push URL to git@github.com:seeraven/gitcache.git and configure LFS.
GITCACHE_LOGLEVEL=Debug ./gitcache_v1.0.16_Darwin_arm64/gitcache git clone    0.26s user 0.14s system 7% cpu 5.692 total

Warm clone

(⎈|N/A:default) ~/test  time GITCACHE_LOGLEVEL=Debug ./gitcache_v1.0.16_Darwin_arm64/gitcache git clone git@github.com:seeraven/gitcache.git test_gitcache_warm
2024-05-14 10:15:00 Python executable: /Users/andresmontalban/test/gitcache_v1.0.16_Darwin_arm64/gitcache
2024-05-14 10:15:00 Called as ['./gitcache_v1.0.16_Darwin_arm64/gitcache', 'git', 'clone', 'git@github.com:seeraven/gitcache.git', 'test_gitcache_warm']
2024-05-14 10:15:00 handle_git_command(['./gitcache_v1.0.16_Darwin_arm64/gitcache', 'git'], ['clone', 'git@github.com:seeraven/gitcache.git', 'test_gitcache_warm']) started
2024-05-14 10:15:00 Found global options [], command 'clone', command options '[]' and arguments ['git@github.com:seeraven/gitcache.git', 'test_gitcache_warm'].
2024-05-14 10:15:00 Loading configuration file /Users/andresmontalban/.gitcache/config.
2024-05-14 10:15:00 Loading configuration file /Users/andresmontalban/.gitcache/config.
2024-05-14 10:15:00 Can't load configuration file /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/gitcache.config as it does not exist!
2024-05-14 10:15:00 Update time of mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache not reached yet.
2024-05-14 10:15:00 Starting Clone from mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache.
2024-05-14 10:15:00 Retry to execute command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git test_gitcache_warm' up to 3 times.
2024-05-14 10:15:00 Execute command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git test_gitcache_warm' (shell=False, cwd=None) with command timeout of 3600 seconds and output timeout of 300 seconds. stderr_capture=True
Cloning into 'test_gitcache_warm'...
done.
2024-05-14 10:15:00 Command '/usr/bin/git -c lfs.url=git@github.com:seeraven/gitcache.git/info/lfs -c lfs.storage=/Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/lfs clone /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache/git test_gitcache_warm' finished with return code 0.
2024-05-14 10:15:00 Clone from mirror /Users/andresmontalban/.gitcache/mirrors/github.com/seeraven/gitcache was successfully completed within 0.1 seconds.
2024-05-14 10:15:00 Setting push URL to git@github.com:seeraven/gitcache.git and configure LFS.
GITCACHE_LOGLEVEL=Debug ./gitcache_v1.0.16_Darwin_arm64/gitcache git clone    0.15s user 0.07s system 82% cpu 0.268 total

Native clone

(⎈|N/A:default) ~/test  time git clone git@github.com:seeraven/gitcache.git test_gitcache_native
Cloning into 'test_gitcache_native'...
remote: Enumerating objects: 1154, done.
remote: Counting objects: 100% (182/182), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 1154 (delta 124), reused 137 (delta 97), pack-reused 972
Receiving objects: 100% (1154/1154), 297.01 KiB | 257.00 KiB/s, done.
Resolving deltas: 100% (752/752), done.
git clone git@github.com:seeraven/gitcache.git test_gitcache_native  0.14s user 0.07s system 3% cpu 5.341 total

Same as for git rev-parse --show-toplevel:

(⎈|N/A:default) ~/test/test_gitcache   master  time GITCACHE_LOGLEVEL=Debug ../gitcache_v1.0.16_Darwin_arm64/gitcache git rev-parse --show-toplevel
2024-05-14 10:17:34 Python executable: /Users/andresmontalban/test/gitcache_v1.0.16_Darwin_arm64/gitcache
2024-05-14 10:17:34 Called as ['../gitcache_v1.0.16_Darwin_arm64/gitcache', 'git', 'rev-parse', '--show-toplevel']
2024-05-14 10:17:34 handle_git_command(['../gitcache_v1.0.16_Darwin_arm64/gitcache', 'git'], ['rev-parse', '--show-toplevel']) started
2024-05-14 10:17:34 Found global options [], command 'rev-parse', command options '[]' and arguments [].
2024-05-14 10:17:34 Command 'rev-parse' is not handled by gitcache. Calling the real git command.
2024-05-14 10:17:34 Loading configuration file /Users/andresmontalban/.gitcache/config.
/Users/andresmontalban/test/test_gitcache
GITCACHE_LOGLEVEL=Debug ../gitcache_v1.0.16_Darwin_arm64/gitcache git    0.13s user 0.03s system 90% cpu 0.175 total

Native

(⎈|N/A:default) ~/test/test_gitcache   master  time git rev-parse --show-toplevel
/Users/andresmontalban/test/test_gitcache
git rev-parse --show-toplevel  0.00s user 0.00s system 59% cpu 0.012 total
seeraven commented 6 months ago

Cool, so I will close this issue now.