leetaizhu / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

set DEPOT_TOOLS_WIN_TOOLCHAIN=0 causes gyp_libyuv to be slow #438

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

set GYP_DEFINES=target_arch=ia32 libyuv_enable_svn=1
set DEPOT_TOOLS_WIN_TOOLCHAIN=0
call python gyp_libyuv -fninja -G msvs_version=2013 libyuv_test.gyp

What is the expected output? 
gyp_libyuv should be fast

What do you see instead?
gyp_libyuv takes about 30 seconds.

Please use labels and text to provide additional information.
A build without DEPOT_TOOLS_WIN_TOOLCHAIN is fast

set DEPOT_TOOLS_WIN_TOOLCHAIN=0 is required for compilers other than vs2013.

set GYP_DEFINES=target_arch=ia32 libyuv_enable_svn=1
call python gyp_libyuv -fninja -G msvs_version=2013 libyuv_test.gyp

Original issue reported on code.google.com by fbarch...@google.com on 6 May 2015 at 12:33

GoogleCodeExporter commented 9 years ago
I can't reproduce this is on a non-corp machine:

D:\src\libyuv>set GYP_DEFINES=target_arch=ia32 libyuv_enable_svn=1

D:\src\libyuv\trunk>set DEPOT_TOOLS_WIN_TOOLCHAIN=0

D:\src\libyuv\trunk>tim cmd /c call python gyp_libyuv -fninja -G 
msvs_version=2013 libyuv_test.gyp
Updating projects from gyp files...

real: 0m2.954s
qpc: 2951467us

D:\src\libyuv\trunk>set DEPOT_TOOLS_WIN_TOOLCHAIN=

D:\src\libyuv\trunk>tim cmd /c call python gyp_libyuv -fninja -G 
msvs_version=2013 libyuv_test.gyp
Updating projects from gyp files...

real: 0m2.297s
qpc: 2310707us

----------

So I think it must be something to do with either your local machine, or an app 
installed by corp policy.

Original comment by scottmg@chromium.org on 6 May 2015 at 1:00

GoogleCodeExporter commented 9 years ago
Reconfirming gyp is slow on corp machine

d:\src\libyuv\trunk>start vs.exe +new source\row_neon64.cc
d:\src\libyuv\trunk>set GYP_DEFINES=target_arch=ia32 libyuv_enable_svn=1
d:\src\libyuv\trunk>set DEPOT_TOOLS_WIN_TOOLCHAIN=0
d:\src\libyuv\trunk>timex python gyp_libyuv -fninja -G msvs_version=2013 
libyuv_test.gyp
Updating projects from gyp files...
TIMEX 28275.00 ms (28.27 seconds), python gyp_libyuv -fninja -G 
msvs_version=2013 libyuv_test.gyp

d:\src\libyuv\trunk>set DEPOT_TOOLS_WIN_TOOLCHAIN=
d:\src\libyuv\trunk>timex python gyp_libyuv -fninja -G msvs_version=2013 
libyuv_test.gyp
Updating projects from gyp files...
TIMEX 3655.00 ms (3.65 seconds), python gyp_libyuv -fninja -G msvs_version=2013 
libyuv_test.gyp

Original comment by fbarch...@google.com on 7 May 2015 at 5:32

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@google.com on 7 May 2015 at 5:34

GoogleCodeExporter commented 9 years ago
Consider grabbing two ETW traces, one with and one without 
DEPOT_TOOLS_WIN_TOOLCHAIN=0. Share them on Google drive and I can take a look. 
UIforETW in Tracing to File mode should make this easy.

Original comment by brucedaw...@chromium.org on 7 May 2015 at 6:09

GoogleCodeExporter commented 9 years ago
Testing with a Chromium repo shows that set DEPOT_TOOLS_WIN_TOOLCHAIN=0 adds 
about 20 seconds to the time. Two runs were done with it clear and then two 
runs with it set. GYP_MSVS_VERSION was set to 2013 so that the same compiler 
version would be configured. No builds were done.

The tests without DEPOT_TOOLS_WIN_TOOLCHAIN ran at an extremely reliable 1 min 
0.5 s.
The tests with DEPOT_TOOLS_WIN_TOOLCHAIN=0 ran at an extremely reliable 1 min 
20.5 s.

So, there is a fixed 20 second penalty for having DEPOT_TOOLS_WIN_TOOLCHAIN=0. 
The 33% increase will be difficult to profile, but if Frank gets an ETW profile 
of the ~400% slowdown in libyuv it should be easy to see what the difference 
is, and then decide if we care.

My Chromium test batch file is shown here:

@set starttime=%time%
python build\gyp_chromium
@echo From %starttime% to %time%

Original comment by brucedaw...@chromium.org on 7 May 2015 at 11:36

GoogleCodeExporter commented 9 years ago
When building with DEPOT_TOOLS_WIN_TOOLCHAIN=0 this two batch files are each 
called twice:

"C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\vsvars32.bat"
"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64

They are called with either "&& set" or "&& for %i in (cl.exe) do @echo 
LOC:%~$PATH:i"

These batch files typically take three to five seconds to run each time, so 
this explains the 20 second regression in generating ninja project files.

We should probably try to run these batch files less frequently, or perhaps 
write our own batch file that just sets the variables directly without the 
heavy levels of checking. This speedup will be noticeable when generating 
projects without DEPOT_TOOLS

Since Scott does not see this slowdown it is probably exacerbated by 
anti-virus/anti-malware software installed on corporate machines.

Original comment by brucedaw...@chromium.org on 15 May 2015 at 10:43

GoogleCodeExporter commented 9 years ago
timing for vcvars.bat with different versions

c:\work>timex "c:\Program Files (x86)\Microsoft Visual Studio 
12.0\VC\bin\vcvars32.bat"
TIMEX 2663.00 ms (2.66 seconds), "c:\Program Files (x86)\Microsoft Visual 
Studio 12.0\VC\bin\vcvars32.bat"

c:\work>timex "c:\Program Files (x86)\Microsoft Visual Studio 
11.0\VC\bin\vcvars32.bat"
TIMEX 2600.00 ms (2.60 seconds), "c:\Program Files (x86)\Microsoft Visual 
Studio 11.0\VC\bin\vcvars32.bat"

c:\work>timex "c:\Program Files (x86)\Microsoft Visual Studio 
10.0\VC\bin\vcvars32.bat"
Setting environment for using Microsoft Visual Studio 2010 x86 tools.
TIMEX 2217.00 ms (2.22 seconds), "c:\Program Files (x86)\Microsoft Visual 
Studio 10.0\VC\bin\vcvars32.bat"

c:\work>timex "c:\Program Files (x86)\Microsoft Visual Studio 
10.0\VC\bin\vcvars32.bat"
Setting environment for using Microsoft Visual Studio 2010 x86 tools.
TIMEX 2078.00 ms (2.08 seconds), "c:\Program Files (x86)\Microsoft Visual 
Studio 10.0\VC\bin\vcvars32.bat"

Original comment by fbarch...@chromium.org on 15 May 2015 at 10:53

GoogleCodeExporter commented 9 years ago
I do see that for other batch files and sometimes things like mingw utilities 
that are used infrequently on my corp machine. I don't think we should add 
extra hacks (caching) for corp building as it's generally futile for building 
Chrome anyway.

We should file an internal ticket first at least, as it's a clear set of repro 
steps for winops to investigate/improve.

FTR:
d:\src\crashpad\crashpad>timavg cmd /c "c:\Program Files (x86)\Microsoft Visual 
Studio 12.0\VC\bin\vcvars32.bat"

real: 0m0.250s
qpc: 240113us

real: 0m0.250s
qpc: 257249us

real: 0m0.250s
qpc: 241413us

real: 0m0.250s
qpc: 252833us

real: 0m0.250s
qpc: 244257us

real: 0m0.250s
qpc: 252511us

real: 0m0.235s
qpc: 243547us
avg = 0m0.247s
      246912us

Original comment by scottmg@chromium.org on 16 May 2015 at 1:00

GoogleCodeExporter commented 9 years ago
This must be a Google corporate image problem. On my home machine I see that 
vcvarsall.bat takes ~0.5 - 0.8 s. This is a bit slower than what Scott is 
seeing, but an order of magnitude faster than what Frank reports.

On my (extremely beefy) corporate image developer machine I see times around 
~2.5 s. This is the test batch file that I am rumming:

@setlocal
@set starttime=%time%
call "%VS120COMNTOOLS%..\..\VC\vcvarsall.bat"
@echo Ran from %starttime% to %time%

So, Microsoft's batch file is inefficient (it shouldn't take .25 to .8 s to 
run) and Google's corporate image makes it run much slower.

A simplified version that just sets the environment variables that we want 
might still be worth creating. This would take the problem off the table 
entirely.

Original comment by brucedaw...@chromium.org on 18 May 2015 at 6:25

GoogleCodeExporter commented 9 years ago
I verified from looking at Frank's trace that bit9 is the problem. Over a 5.13 
s period where one of these batch files was run I see approximately 4.432 s of 
CPU time inside Parity.sys. Seems pretty conclusive to me. The remaining 
discrepancy between Scott's machine and Frank's is probably tracing overhead on 
the many many context switches.

Original comment by brucedaw...@chromium.org on 18 May 2015 at 7:52

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@chromium.org on 18 May 2015 at 10:51

GoogleCodeExporter commented 9 years ago
bit9
TIMEX 9676.00 ms (9.68 seconds), gclient config 
http://libyuv.googlecode.com/svn/trunk
TIMEX 3040729.00 ms (3040.73 seconds) (50.68 minutes), gclient sync
TIMEX 21776.00 ms (21.78 seconds), call python gyp_libyuv -fninja 
-Gconfig=Release -Gmsvs_version=2013 libyuv_test.gyp
TIMEX 13105.00 ms (13.11 seconds), ninja -v -C out\Release

no bit9
TIMEX 8957.00 ms (8.96 seconds), gclient config 
http://libyuv.googlecode.com/svn/trunk
TIMEX 2138673.00 ms (2138.67 seconds) (35.64 minutes), gclient sync
TIMEX 3622.00 ms (3.62 seconds), python gyp_libyuv -fninja -G msvs_version=2013 
--depth=. libyuv_test.gyp
TIMEX 13308.00 ms (13.31 seconds), ninja -v -C out\Release

for 'no bit9' I used c:\src\ folder and set DEPOT_TOOLS_WIN_TOOLCHAIN=

Original comment by fbarch...@chromium.org on 21 May 2015 at 6:39

GoogleCodeExporter commented 9 years ago
There's a few less batch invocations and registry queries in the depot_tools 
one, as it just runs a minimal generated on in 
depot_tools\win_toolchain\vs2013_files\win8sdk\bin\SetEnv.cmd.

But in any case, the 21s vs. 3s seems to point to bit9 sucking.

Original comment by scottmg@chromium.org on 21 May 2015 at 6:52

GoogleCodeExporter commented 9 years ago
Yep, the 18-20 s slowdown is definitely bit9. Profiling proved that. There are 
three possible courses of action:

1) Wait for bit9 to be more efficient.
2) Use some variation of SetEnv.cmd in the non-depot-tools case (!!!)
3) Ignore the problem, don't generate ninja files so frequently.

I haven't looked at the ninja generation enough to say for sure but it seems 
that #2 might be easy. Somebody who is most affected by this would probably 
have to make the change, which 
probably means Frank. Or me if I get sufficiently annoyed.

Original comment by brucedaw...@chromium.org on 21 May 2015 at 9:43