msysgit / git

msysGit-based Git for Windows 1.x is now superseded by Git for Windows 2.x
http://github.com/git-for-windows/git
Other
1.01k stars 316 forks source link

[reopen] Inverted Commit Issues with Git merge --squash #242

Closed hach-que closed 9 years ago

hach-que commented 10 years ago

We are still being impacted by https://github.com/msysgit/git/issues/120 (I can't find any way to reopen this issue).

This on average occurs about once a month in a repo with 30-40 developers actively. This still occurs with at least Git 1.9.0.

The theory posted in the previous issue was "I suspect further, that it is a problem with the file system and could imagine that it is due to the time stamps being written using sub-second precision and subsequent inconsistent comparisons.", but (at least on our side), we have no way to verify if this is the case.

t-b commented 10 years ago

Well without reliable reproduction recipe this is really hard to investigate. Could you give some details about the surrounding involved? OS, filesystem, server side spring to my mind. You also said that this might be related to automated tools calling git. What are these?

In #120 you said you have a script which tries to reproduce the failure. Could you post that, maybe we can further tweak it.

dscho commented 10 years ago

@hach-que since 30-40 developers are affected by it, surely there must be enough interest in a fix to dedicate serious developer time to investigating and fixing the problem. And what I mean by this is: you surely want to make certain that this is addressed, rather than hoping that some volunteers would solve your problem in their free time, correct?

hach-que commented 10 years ago

@dscho We simply don't have the experience with the Git code base to understand how this race condition can occur. If it was a segmentation fault, or some form of crash, then I'd agree with you. But race conditions tend to involve complex interactions within the code, which would take someone unfamiliar with the code base a significant amount of time to understand.

@t-b This is on Windows, on an NTFS filesystem (not a network drive). We're using Arcanist (https://secure.phabricator.com/book/phabricator/article/arcanist/) which invokes several git commands in quick succession as it automates merging, squashing and pushing of code to a remote repository. It doesn't do these in parallel, but it does do them immediately one after another (which is why anyone running the commands manually would not encounter it). Here's a trace of the commands (and other operations) that Arcanist performs:

(I've replaced the domain in this trace log)

$ arc --trace land
libphutil loaded from 'C:\Tools\Arcanist\libphutil\src'.
arcanist loaded from 'C:\Tools\Arcanist\arcanist\src'.
Config: Reading user configuration file "C:\Users\jamesr\AppData\Roaming/.arcrc"...
Config: Did not find system configuration at "C:\ProgramData\Phabricator/Arcanist/config".
Working Copy: Reading .arcconfig from "C:\Repos\TestRepo\test-repo/.arcconfig".
Working Copy: Path "C:\Repos\TestRepo\test-repo" is part of `git` working copy "C:\Repos\TestRepo\test-repo".
Working Copy: Project root is at "C:\Repos\TestRepo\test-repo".
Config: Did not find local configuration at "C:\Repos\TestRepo\test-repo\.git\arc/config".
Loading phutil library from 'C:\Repos\TestRepo\test-repo\Tools\Arcanist'...
>>> [0] <conduit> conduit.connect() <bytes = 448>
>>> [1] <http> https://XXXXXXXXX/api/conduit.connect
<<< [1] <http> 1,179,276 us
<<< [0] <conduit> 1,180,425 us
>>> [2] <conduit> arcanist.projectinfo() <bytes = 184>
>>> [3] <http> https://XXXXXXXXX/api/arcanist.projectinfo
<<< [3] <http> 280,432 us
<<< [2] <conduit> 281,659 us
>>> [4] <exec> $ git branch --no-color
<<< [4] <exec> 39,644 us
Landing current branch 'whatever'.
>>> [5] <exec> $ git branch --no-color
<<< [5] <exec> 35,722 us
>>> [6] <exec> $ git rev-parse --verify "whatever"
<<< [6] <exec> 29,088 us
>>> [7] <exec> $ git diff --no-ext-diff --no-textconv --raw "HEAD" --
>>> [8] <exec> $ git ls-files --others --exclude-standard
<<< [8] <exec> 21,764 us
<<< [7] <exec> 24,041 us
>>> [9] <exec> $ git diff-files --name-only
<<< [9] <exec> 21,583 us
>>> [10] <exec> $ git checkout "master"
<<< [10] <exec> 44,341 us
Switched to branch master. Updating branch...
>>> [11] <exec> $ git pull --ff-only --no-stat
<<< [11] <exec> 2,340,745 us
>>> [12] <exec> $ git log "origin/master".."master"
<<< [12] <exec> 18,831 us
>>> [13] <exec> $ git log --oneline "whatever" "^master" --
<<< [13] <exec> 19,433 us
The following commit(s) will be landed:

3d1a7c2 test commit

>>> [14] <exec> $ git branch --no-color
<<< [14] <exec> 26,113 us
>>> [15] <exec> $ git checkout "whatever"
<<< [15] <exec> 40,932 us
Switched to branch whatever. Identifying and merging...
>>> [16] <exec> $ git rev-parse "HEAD"
<<< [16] <exec> 20,719 us
>>> [17] <exec> $ git merge-base "origin/master" "3d1a7c24f371a2bdc7c39825c8e10eadb38af9ff"
<<< [17] <exec> 20,251 us
>>> [18] <exec> $ git log --first-parent --format=medium "6412175bcdd12e8d05b87c863c690701746ddc33".."3d1a7c24f371a2bdc7c39825c8e10eadb38af9ff"
<<< [18] <exec> 19,081 us
>>> [19] <conduit> differential.query() <bytes = 184>
>>> [20] <http> https://XXXXXXXXX/api/differential.query
<<< [20] <http> 381,312 us
<<< [19] <conduit> 382,238 us

    Revision 'D3497: test commit' has not been accepted. Continue anyway?
    [y/N] y

>>> [21] <conduit> differential.getcommitmessage() <bytes = 192>
>>> [22] <http> https://XXXXXXXXX/api/differential.getcommitmessage
<<< [22] <http> 498,414 us
<<< [21] <conduit> 505,254 us
Landing revision 'D3497: test commit'...
>>> [23] <conduit> harbormaster.querybuildables() <bytes = 260>
>>> [24] <http> https://XXXXXXXXX/api/harbormaster.querybuildables
<<< [24] <http> 298,835 us
<<< [23] <conduit> 301,124 us
Merging whatever into master
>>> [25] <exec> $ git merge --no-stat "master" -m "Automatic merge by 'arc land'
"
Already up-to-date.
<<< [25] <exec> 30,248 us
>>> [26] <exec> $ git checkout "master"
<<< [26] <exec> 40,991 us
>>> [27] <exec> $ git merge --no-stat --squash --ff-only "whatever"
<<< [27] <exec> 52,608 us
>>> [28] <exec> $ git commit -F "C:\Users\jamesr\AppData\Local\Temp\3hipr3k2z4mc0wwo\15762D.tmp"
<<< [28] <exec> 35,560 us
>>> [29] <exec> $ git diff --no-ext-diff --no-textconv --raw "HEAD" --
>>> [30] <exec> $ git ls-files --others --exclude-standard
<<< [30] <exec> 19,015 us
<<< [29] <exec> 20,895 us
>>> [31] <exec> $ git diff-files --name-only
<<< [31] <exec> 19,361 us
>>> [32] <event> land.willPushRevision <listeners = 0>
<<< [32] <event> 175 us
Pushing change...

>>> [33] <exec> $ git push "origin" "master"
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 377 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To ssh://git@XXXXXXXXX/diffusion/TST/test-repo.git
   6412175..5503191  master -> master
<<< [33] <exec> 2,360,949 us
>>> [34] <exec> $ git remote show -n origin
<<< [34] <exec> 31,420 us
>>> [35] <conduit> repository.query() <bytes = 281>
>>> [36] <http> https://XXXXXXXXX/api/repository.query
<<< [36] <http> 349,445 us
<<< [35] <conduit> 352,138 us
>>> [37] <conduit> diffusion.looksoon() <bytes = 195>
>>> [38] <http> https://XXXXXXXXX/api/diffusion.looksoon
<<< [38] <http> 392,477 us
<<< [37] <conduit> 393,450 us
>>> [39] <conduit> differential.query() <bytes = 190>
>>> [40] <http> https://XXXXXXXXX/api/differential.query
<<< [40] <http> 595,387 us
<<< [39] <conduit> 596,309 us
>>> [41] <conduit> arcanist.projectinfo() <bytes = 184>
>>> [42] <http> https://XXXXXXXXX/api/arcanist.projectinfo
<<< [42] <http> 380,131 us
<<< [41] <conduit> 381,195 us
>>> [43] <exec> $ git remote show -n origin
<<< [43] <exec> 34,462 us
>>> [44] <conduit> repository.query() <bytes = 281>
>>> [45] <http> https://XXXXXXXXX/api/repository.query
<<< [45] <http> 385,580 us
<<< [44] <conduit> 386,904 us

Cleaning up feature branch...
>>> [46] <exec> $ git rev-parse --verify "whatever"
<<< [46] <exec> 30,860 us
(Use `git checkout -b "whatever" "3d1a7c24f371a2bdc7c39825c8e10eadb38af9ff"` if
you want it back.)
>>> [47] <exec> $ git branch -D "whatever"
<<< [47] <exec> 26,659 us
Done.

The issue arises when git merge --no-stat --squash --ff-only "whatever" is run. Any changes that were made between "master" and "origin/master" when first running Arcanist are reverted, and the final squash contains the changes made in the feature branch and a reversion of all of those changes. When this command has run, master has already been moved up to origin/master and a merge commit has been created, so the working directory should be clean before the squash runs.

t-b commented 10 years ago

@hach-que Don't get @dscho wrong. It is not that nobody wants to deal with this kind of problem. It is just that it takes considerable time and effort to deal with such cases.

So you are running arcanist on a windows machine acting as the server for you all, right? Well, here is what I would try to do:

hach-que commented 10 years ago

Arcanist is running on the developer's machine which is running Windows., where it's squashing and merging. It then pushes to a remote repository running on a Linux machine.

I'll ask everyone to turn on those options and we'll see if it still occurs.

dscho commented 10 years ago

@hach-que since it is happening only rarely, and since the plan is apparently indeed to lean on volunteers doing free work in their free time, I would like to encourage you to read up about the GIT_TRACE environment variable and also to make sure that developers who encounter issues do not simply complain, but write down the exact circumstances. That would include also reading up on reflogs.

Then, the diligent thing would be to try to reproduce the issue on the same machine, using plain Git commands (which you possibly have to find out using another Arcanist run with GIT_TRACE enabled).

As long as you have no way to reproduce reliably, there is no hope. But even when you have that reproduction "recipe", you need to work harder: try to reduce the problem. That means that you will have to rewrite history (only on topic branches, of course, not origin/master!) until you cannot remove/shrink more without breaking the reproduction of the problem. And then you're still not done: you will have to turn that little problem into a shell script that demonstrates the problem. Once you have that shell script, and you still "don't have the experience with the Git code base to understand how this race condition can occur", you can post it here and hope that some competent developer will donate their expertise and time to solve your problem for you.

You see, debugging is a tedious job, and it is really your duty to go and do it. I fully expect this issue to go exactly nowhere unless you step up the efforts on your side. We should stop pretending that talking about the issue will magically make it go away, let alone solve it. It will only be solved if you roll up your sleeves and get down to work. It is your business that is affected, after all.

oblique commented 10 years ago

I believe this issue is solved with 9aa91af and 426ddee commits (merged in git v2.0.1). It is reproducible when you are using automated tools (e.g. Git Extensions) because there is a race condition when git status is called. In the case of Git Extensions, git status is called every time a file is changed in the directory.

dscho commented 9 years ago

@oblique the proof lies in the pudding, as they say. Can you reproduce the issue in question? If so, does it still occur when using a Git built from v2.0.1?

oblique commented 9 years ago

Sorry I didn't manage to reproduce the issue, I actually had similar problem with git rebase (exactly the one that is described in the commit message) and those 2 commits fixed it. Maybe @hach-que should install the latest netinstall and try to reproduce it.

dscho commented 9 years ago

I am just assuming that it is fixed by Git for Windows 2.x -- or that there is not enough interest to pursue a resolution.