libgit2 / libgit2sharp

Git + .NET = ❤
http://libgit2.github.com
MIT License
3.16k stars 888 forks source link

GitRepo.Commits.QueryBy(fpath) fails with "Given key not present in dictionary" for some folders only on master branch! #1520

Open ltoshev opened 6 years ago

ltoshev commented 6 years ago

I use this code:

GitRepo = new Repository(Repository.Discover(SolutionDir));

string fpath = folder.Replace(SolutionDir, string.Empty).Replace("\", "/").TrimStart('/');

        foreach (var c in GitRepo.Commits.QueryBy(fpath).Take(1))
        {
            revisionsList.Add(folder.Split(Path.DirectorySeparatorChar).Last(), c.Commit.Committer.When.ToUnixTimeSeconds());
        }

The GitRepo.Commits.QueryBy(fpath) fails for some folders on master branch and only on master.

It works fine when you run it of a branch made from master.

I tried to get back the version of the library and found this code works in v0.22.1 from nuget.

It fails for 0.23.0, 0.23.1, 0.24.0 - all tested to fail.

If you can fix it in 0.24.1 will be great.

Thanks,

ethomson commented 6 years ago

Can you share the repository where this error occurs?

ltoshev commented 6 years ago

Hi

it is a private repository, I cannot share it.

Right now we have a problem that this code GitRepo.Commits.QueryBy(fpath) gets slower and slower because we add commits at a rapid rate. It is a github repo and it was made from scratch a few months ago. I checked the underlying code in 0.22.1 it fetches all commits which in our case are lots and lots... it takes 5 minutes to give the data, while we just want the top most commit in the given folder. Is there other way to do that?

ltoshev commented 6 years ago

@ethomson can you take a look, sorry for my delayed answer

Cobster commented 6 years ago

@ltoshev I recently hit this same issue whenever one of my repositories has multiple commits that have the same timestamp Commit.Author.When. You mentioned that you add commits at a rapid rate, so this seems like it could be same issue.

I dug into the code and found that when you use Commits.QueryBy(string) it will automatically sort by time and I assume the commit gets placed in a dictionary keyed by time stamp. There must be some code that also ignores adding a duplicate key rather than throwing an exception thus the failure when trying to lookup a commit and its missing.

To fix this, just use the topological sorting strategy. At least this works in my case, as my repo only contain a master branch and never any merge commits.

Workaround:

repo.Commits.QueryBy(path, new CommitFilter { SortBy = CommitSortStrategies.Topological });
gukoff commented 4 months ago

This is the same problem as https://github.com/libgit2/libgit2sharp/issues/1410

The logic in FileHistory.FullHistory(...) expects that the fetched commits are topologically sorted, and you always iterate the commits after you iterated at least one of their children.

If any parent commit isn't strictly older than its child, then when you sort by time, you may iterate them in the wrong order and get an exception. When the timestamps of the parent and the child are equal, it's a 50/50 chance.

I believe that even when using topological sorting, if your history is branched enough, you can see this exception.

I suggest closing this issue as a duplicate.