Open osklyar opened 7 years ago
@osklyar Thanks for the report. Given that we're approaching a stable release of v4, it's time to focus on performance and fix long-standing issues on that front. So we'll be working on this soon.
I have hit some performance issues during clone as well. My repository .git dir is ~24MB after git gc --aggressize; git repack -a -d
but cloning seems to take about 1m15s on my Core i7 based MacBook. Using the standard git tools, the same process is done in less than 1s. Watching the clone progress via:
git.CloneOptions{
URL: gitDir(repo.baseDir),
ReferenceName: ref.Target(),
SingleBranch: true,
Progress: os.Stderr,
}
shows it get through this output in ~10s:
Counting objects: 196522, done.
Compressing objects: 100% (41983/41983), done.
Total 196522 (delta 153594), reused 196483 (delta 153563)
but then the process drags on with ~130% cpu and no output. I grabbed a pprof 30s profile and generated a graph:
It seems that a large amount of time is spent in seek syscalls ultimately coming from packfile.
When cloning large repositories, with respect to the occupied space and less so with respect to the number of commits
go-git
uses some sort of a different strategy thangit
resulting in massive memory footprint and very long clone times. Here cloning a repo that unpacks into 1.5Gb and contains ca. 110k commits,go-git
uses up to 5Gb RAM and runs over 4m whilegit
uses 290Mb and runs in about 1m (tested with geat based ongo-git
):Memory requirements scales more or less linearly with the commit number and repository size, below e.g. a smaller repo with quite a lot of commits and
go-git
uses about 8x more memory thangit
. On the performance side, the growth of the repository size leads to much faster degradation: for the 1.5 Gb repo about the difference is 4 times, for a 10 times smaller repo below the times are about the same forgit
andgo-git
whilegit
shows approximately the same times as for 1.5Gb repo.Cloning
github.com:moby/moby
with 32k commits and 170Mb overall unpacked size takes about the same 1m20s with bothgit
andgo-git
. Memory wise,go-git
loses uses the max of 320Mb (2x the repo size) andgit
45Mb (0.25x the repo size):