Open kaidokert opened 3 years ago
Thanks for filing! Good news - We're actively investigating the performance of Git on Windows and are working with internal teams here at Microsoft and external teams at Github to diagnose the problem and find solutions for reducing the performance gaps between Windows and Mac/Linux. I will update this thread as fixes are identified and complete. In the meantime, can you repro the issue on Windows and capture a trace via Feedback Hub? We can use this data to better understand how Git performs in the wild and fuel our investigations.
My testing cloning Chromium locally between 2 SSDs (Intel 660p & WD SN550).
Windows Terminal PowerShell Core window D:\Temp Measure-Command { git clone c:\Users\chad\source\repos\chromium-mirror d:\temp\chromium } Cloning into 'd:\temp\chromium'... done. Checking out files: 100% (363000/363000), done.
Days : 0 Hours : 0 Minutes : 4 Seconds : 38 Milliseconds : 220 Ticks : 2782200516 TotalDays : 0.00322013948611111 TotalHours : 0.0772833476666667 TotalMinutes : 4.63700086 TotalSeconds : 278.2200516 TotalMilliseconds : 278220.0516
Windows Terminal Ubuntu WSL /mnt/c/Users/chad/source/repos$ time git clone /mnt/c/Users/chad/source/repos/chromium-mirror/ /mnt/d/Temp/ Cloning into '/mnt/d/Temp'... done. Updating files: 100% (363000/363000), done.
real 82m14.742s user 1m30.634s sys 9m18.089s
Seems like WSL does NOT help the Git performance in Windows.
@itoleck are you using WSL2? if so cross-os filesystem performance is slow, https://docs.microsoft.com/en-us/windows/wsl/compare-versions#performance-across-os-file-systems
@AvriMSFT posted feedback hub capture at https://aka.ms/AAcdkct. Although not sure how useful that is, as it's just even rm -r directory
command that is slow.
Ping, the label here still says "Needs Author Feedback", but i'm not sure what other information could i provide.
Hi @kaidokert and @itoleck
This is a bit tangent to the actual issue, but: the next Git release (2.32.0) will come with the "parallel checkout" feature, which allows git checkout
, git clone
, and other checkout-related commands to parallelize the creation of files in the working tree.
I haven't tested it on WSL, but I'm getting around 2x speedup when cloning the linux repository with Git for Windows on a SSD.
If you would be interested in testing it out before the release, you'd need to compile Git on your machine. The code for this feature has already landed in the master
branch of the upstream Git repository and also on the main
branch of the Git-for-Windows repository. Then, to activate parallel-checkout, you must set the checkout.workers
config to the desired number of parallel processes (one means sequential).
As one of the developers of this feature, I'd be very interesting in any feedback you have about it. So if you do run the tests with parallel checkout or have any question/suggestion about it, please let me know :)
It does seem like majority of the slowness here doesn't actually come from file creation
but from file deletion
, which of course happens a lot when switching branches.
Hey @kaidokert I didn't see this called out in the template but do you have a Windows Defender exclusion enabled for this folder? It would be interesting to see how the cloning timing might vary with defender out of the picture.
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.
@asklar Does Windows Defender normally examine files when they're being permanently deleted? Just curious; not too familiar with how antivirus works under the hood 🙂
I don't work on Defender so I don't know for sure, but I wouldn't be surprised 🤷 :)
@asklar Haha, okay, thanks 😄
I have Defender turned off for the entire drive, yes.
Update: The filesystem and performance teams are working on some major improvements to git which we think will directly impact your issue. Once fixes are in, I'll update the thread :).
@itoleck re. https://github.com/microsoft/Windows-Dev-Performance/issues/87#issuecomment-828676597
Note that running git
in WSL to clone repo into Windows filesystem has known perf issues. Running git in WSL to clone a repo from one Windows' folder into another Windows folder will likely double the negative overhead and is a worst-case scenario.
If you're in Linux/WSL, clone repos locally in the Linux filesystem. If your files are in Windows, run git
in Windows for the best perf.
Periodic ping for any interesting updates on this issue ?
There have already been some updates made to Git, so hopefully the original benchmarks are improved.
More updates coming at Build next month ;)
so hopefully the original benchmarks are improved.
I'll do a re-test on Azure VMs side by side on Linux and Windows. Any particular base images to recommend for verifying improvements ?
I had Terraform deploy 2 identical HW VMs in Azure, gist with config here
Both free tier Standard B1s VMs, same disk config. Windows Server 2022 Azure Datacenter edition. All commands below were run in respective temporary directories, assuming those should default to decently fast out of the box.
I used a smaller and lesser known repo as a benchmark just to cut down on the wait time. Its ~2GB.
git clone --mirror https://github.com/youtube/cobalt.git cobalt-mirror
Updated timings; Linux, clone main branch from mirror:
time git clone cobalt-mirror cobalt
real 1m2.444s
Windows:
Measure-Command { git clone cobalt-mirror cobalt }
TotalMinutes : 7.19
About ~6x slower
Switch branch, Linux:
time git checkout 19.lts.1+
real 0m41.106s
Windows:
Measure-Command { git checkout 19.lts.1+ }
TotalMinutes : 6.62
Again about 6-7x slower
Linux delete directory:
time rm -rf cobalt
real 0m1.600s
Windows:
Measure-Command { rm -r cobalt }
TotalMinutes : 1.40
~80x slower
If there are any obvious tunings or tweaks that should be done for disk performance I'd be really happy to know. Also if there's a faster disk config I'd be also happy to try it out - i have this in terraform and can redeploy different VMs/disk with a push. Hoping to make the repo publicly available on Github.
Of course, maybe B1s aren't the best representative performance because they get throttled. They are free though.
I don't know how much can be done with Windows Server, but certainly for client the best first thing to do is to run on a separate volume (partition on the same drive is fine, even a mounted VHDX is an improvement, just don't use paths starting with C:\
or whatever your OS is on).
But I'd say going from the 25x+ difference to 7x difference is about what we'd expect right now. A non-OS disk might be better, but we've been looking more at real client machines rather than virtual server. Anti virus scanners are also a large impact that we believe we've reduced, though again, it should already have been less significant on Server so the improvement will be reduced.
These are certainly better results, yes. Thank you !
As i mentioned i set this up as a Terraform repo, so i can easily test with Win 11 desktop rather than server ( e.g. win11-22h2-pro SKU / Windows-11 MicrosoftWindowsDesktop ) as well.
I'll try the tips with structuring the disks better, at the moment i'm simply doing
os_disk {
storage_account_type = "Standard_LRS"
caching = "ReadWrite"
}
and using c:\temp from there. But i'll mount a separate drive and see what that does. Thanks for the tips !
@zooba Can you elaborate on why doing Git operations on a non-OS partition would be faster?
Can you elaborate on why doing Git operations on a non-OS partition would be faster?
In brief (and I believe we have more detailed documentation on this coming), the system drive will have additional file system filter drivers installed in order to do certain tasks, such as OneDrive sync and system file protection. And a filter driver intercepts every file system operation on a volume to see if it needs to do anything. Generally this is quick, but not doing it is even quicker. So on a clean volume, you'll have far fewer file system filters in the way, which means that overhead is reduced.
@zooba, Wow, very interesting! Thanks 🙂
@zooba, I took your recommendation and split my C:
drive into two partitions, C:
and D:
, then moved my source files and NuGet package caches to D:
, and wow, the difference is night and day! 😮
Interestingly, copying or deleting large folders via command line on D:
doesn't appear to be much faster than doing so on C:
, but loading my projects in Visual Studio feels waaaay faster.
Thank you very much for the tip! ^_^
BTW, if anyone else wants to try this, I'd recommend checking out https://learn.microsoft.com/en-us/windows/dev-drive/#what-should-i-put-on-my-dev-drive
Those instructions are for the new DevDrive feature in Windows 11, but I think they also apply well to what one should put on a non-OS partition in Windows 10 🙂
Although DevDrive is only available on Windows 11, ReFS is available on Windows 10, so I figured I'd give it a shot.
@zooba In my tests, copying files via xcopy
is actually slower on ReFS than on NTFS. Any ideas? 🙂
xcopy .git ..\.git /s /e /h /v /i /k /r > NULL
on C:
(NTFS): 50 seconds
xcopy .git ..\.git /s /e /h /v /i /k /r > NULL
on D:
(ReFS): 65 seconds
I would have figured the copy-on-write mechanism would have made copying on ReFS super fast. Maybe it's because the drive has BitLocker enabled? (BitLocker is enabled on both drives)
OS: Windows 10 Enterprise, Build 19045
I would have figured the copy-on-write mechanism would have made copying on ReFS super fast.
It does, but I'm not sure it's automatically enabled (and I'd be surprised if xcopy
has any special knowledge of ReFS - maybe try robocopy
?)
We're not done with perf work yet - getting devs onto ReFS is just the first step - so you can expect future updates to have more improvements over time. We did also ship a few perf improvements to ReFS specifically with the Dev Drive update, so you won't have those on Win10.
REFS Block Cloning support built-into the copy engine is enabled in the latest Windows Insiders Preview (WIP) - Canary-External release.
Windows Build Number
10.0.18363.0
Processor Architecture
AMD64
Memory
200 Gb
Storage Type, free / capacity
SSD 200GB/ 1TB
Relevant apps installed
git version 2.31.1.windows.1
Traces collected via Feedback Hub
N/A
Isssue description
Checking out large repos even from a local mirror is slow, compared to Linux / Mac.
Even more importantly, so is switching branches / tags.
Steps to reproduce
Let's download a sample well-known repo, about 23 Gb
git clone --mirror https://github.com/chromium/chromium.git chromium-mirror
Now let's check out a source tree from local mirror:
Powershell, NTFS drive:
Just over 12 minutes.
On Linux, ext4, similar hardware
About 24 seconds.
Now, let's check out a bit older tag: Powershell:
15 minutes to switch a tag.
On Linux:
Again, about 22 seconds
Finally, let's delete these experiment directories:
Powershell:
7 minutes
Linux:
5 seconds
Expected Behavior
Would expect checkout speed on similar disks to be at least on the same order of magnitude.
Actual Behavior
The operations in this example are about 25-30x slower, on almost identical hardware.
Of course the problem doesn't seem inherent to Git, it's a similar I/O problem when working with large directory trees with many files, i.e. Node node_modules issues ( #21 ) and others ( #17 #27 ), as evidenced by the fact that
rm -r
took over 60x longer.