libgit2 / libgit2sharp

Git + .NET = ❤
http://libgit2.github.com
MIT License
3.2k stars 889 forks source link

Commit and stage execution time not constant #683

Open joelverhagen opened 10 years ago

joelverhagen commented 10 years ago

I am currently using LibGit2Sharp (v0.15.0) to keep track of differences in many HTML files (~10,000). Each change to a file or new file results in a single commit. I have noticed that writes and commits are getting slower and slower. Here is a visualization of my tests: graph

I tried to adjust the repository.Stage(...) and repository.Commit(...) time so that the size of the HTML file size was not taken into account. This was done by simply dividing the write and commit time by the file size.

I was not sure whether this slow down was expected, but after a bit of reading around, it seems to me that git commit should be constant time with respect to the number of commits or files already in the repository. I am not sure if the same assumption can be made about staging.

Is this a known issue with LibGit2Sharp? Or libgit2?

joelverhagen commented 10 years ago

I've compared the times to using the standard git.exe command line tool. results It seems to me that LibGit2Sharp add and commit is non-constant time with respect to the number of files whereas git.exe is (or at least has a smaller coefficient).

dahlbyk commented 10 years ago

What OS?

joelverhagen commented 10 years ago

I'm running on Windows 8.1.

nulltoken commented 10 years ago

@joelverhagen Thanks a lot for those very detailed graphs.

Is there any way you could share with us the code that is being measured?

joelverhagen commented 10 years ago

Gladly. See LibGit2SharpPerformance.cs.

nulltoken commented 10 years ago

We may see some improvements when libgit2/libgit2#2308 is merged.

Therzok commented 10 years ago

libgit2/libgit2#2308 is in vNext. Is this still an issue?

JoeLiBuDa commented 10 years ago

Not sure if this is still relevant but i used to work with LibGit2Sharp 0.16. The time per commit was rising linear the more commit i had. With version 0.18 the time per commit stays constant around a certain value.

nulltoken commented 10 years ago

798 should bring some performance improvements by @carlosmn in the generation of trees.

nulltoken commented 10 years ago

@joelverhagen Did the situation improve with v0.19?

joelverhagen commented 10 years ago

Nope, I've tests to 10,000 files in a repository and LibGit2Sharp is indeed still non-constant in commit and stage time. perf

nulltoken commented 10 years ago

@joelverhagen Thanks for this answer.

@carlosmn Any thoughts?

carlosmn commented 10 years ago

We made some fixes to tree writing which significantly increased writing performance, making it roughly O(n) (from something closer to O(n^2)), but the index is still implemented as a vector, which means that staging files is going to cost O(n) per addition.

Adding files from the workdir via libgit2sharp via Stage() (wich is what the benchmark does IIRC) also means writing out the index file each time, which introduces additional variability (and it's going to make everything slower).

If you know you're going to stage a bunch of files, using TreeDefinition would increase performance and reduce the variability.

joelverhagen commented 10 years ago

The test code is use is adding a constant number of files, i.e. 1 file is staged, then a commit is made. For each of these iterations, a new Repository instance is initialized and disposed. Despite this, the cost of staging and committing one file increases with respect to number of files in the repository.