ninja-build / ninja

a small build system with a focus on speed
https://ninja-build.org/
Apache License 2.0
11.24k stars 1.6k forks source link

-d stats overhead is a bit too high, due to canonicalize str and friends #1998

Open randomascii opened 3 years ago

randomascii commented 3 years ago

On a typical Chrome NOP build the ninja -d stats output looks something like this:

metric count avg (us) total (ms) .ninja parse 6580 3823.8 25160.6 canonicalize str 6240450 0.0 36.9 canonicalize path 6251390 0.0 25.7 lookup node 6339406 0.0 29.3 .ninja_log load 1 119672.0 119.7 .ninja_deps load 1 564744.0 564.7 node stat 169000 9.9 1679.1 depfile load 327 331.5 108.4

While looking at an ETW trace of this NOP build I noticed that RtlQueryPerformanceCounter accounted for 707 of 13,494 samples, or more than 5%. Testing without -d stats showed that this entry went away entirely.

It seems that when -d stats is specified that ninja probably calls QueryPerformanceCounter twice for each event. ~19 million of the events - roughly 99% - are for canonicalize str, canonicalize path, and lookup node. The total cost of all of these events is roughly 90 ms. That means that the overhead of measuring the cost of those three event types is roughly seven times higher than the cost of those three event types.

In order to ensure that -d stats is not a burden we should remove those three measurements. In addition to improving performance (easily measured by timing ninja with and without) it will simplify the output and make the non-zero averages more visible.

randomascii commented 3 years ago

Another issue with the -d stats flag is that TimerToMicros is called for each sample. This is not as expensive as RtlQueryPerformanceCounter but it is easily avoidable - the conversion only needs to be done a dozen times, when printing the results, instead of 20 million times. The total cost on a profiling run with ninja symbols was ~310 ms, according to the ETW sampling profiler, which adds another ~25-30% to the cost of the profiling.

randomascii commented 2 years ago

Now that canonicalize str, canonicalize path, and lookup node are not being measured by -d stats the cost of -d stats is dramatically lower, so optimization is not needed. Greater precision and portability of the -d stats code would still be nice to have.