Open adammurdoch opened 6 years ago
There are several issues with the current implementation of compilation for C/C++:
I plan to spike a solution where the two tasks are merged back together. When out-of-date, the task would perform the dependency analysis to collect the header dependencies, then delegate to the cache services to look for a matching cache entry or run the task action if no match present.
I did up a quick spike which basically just moves the body of the Depend
task to an @InputFiles
property getter on the compile task, so that the dependency analysis is done during preparing the inputs of the compile task. The spike also removes the depend task from the task graph.
On a test build with 20 projects, 20 source files and 100 header files per project (so a large number of headers given the number of source files), this gives a modest improvement:
Scenario | spike | master | 4.3 | 4.2 |
---|---|---|---|---|
up-to-date :linkDebug :linkRelease parallel |
2.27s | 1.33s | 1.07s | 0.876s |
clean :linkDebug :linkRelease parallel |
57.1s | 59.3s | 50.7s | 46.8s |
The up-to-date case is slower as the dependency analysis is run to calculate the header input files. This regression is somewhat intentional for this spike. The goal was to make clean build faster. I plan to take a look at some quick spikes to make the up-to-date checks faster.
Hacking both 4.2 and my spike so that neither traverse any #include
directives gives these results on the "many headers" test build above:
Scenario | spike with hack | 4.2 with hack |
---|---|---|
up-to-date :linkDebug :linkRelease parallel |
0.97s | 0.92s |
clean :linkDebug :linkRelease parallel |
50.3s | 50.2s |
Basically the same values. While this is not anything we want to do, it does confirm that the performance issues are somewhere in dealing with the header dependency graph. Could be in resolving the includes, finding header files in the longer include path, or just dealing with many more edges and header files in the dependency graphs. Or all of the above. The profiler seems to be saying it's due to finding the files (plus more files to find) but I want to confirm this.
On Windows, we shouldn't actually have a longer include path since 4.2 as we have always discovered the system header directories. This says that either the regression is fixed already (it was only due to multiple tasks doing the same work), or there's a regression which isn't due to a longer include path. I guess I'm going have to run these builds on Windows...
On Windows, we shouldn't actually have a longer include path since 4.2 as we have always discovered the system header directories.
I think that may not be true. Isn't this where we added the system includes? https://github.com/gradle/gradle/blob/REL_4.2.1/subprojects/platform-native/src/main/java/org/gradle/nativeplatform/toolchain/internal/msvcpp/VisualCppPlatformToolProvider.java#L143-L145
I think this isn't seen by the compile task/incremental build because we only add these includes once we get into NativeCompiler
and friends.
You're right, the header discovery in 4.2 never saw those system include directories.
I've merges some improvements to master:
Depend
into AbstractNativeCompileTask
, to avoid the cost of a second task for each binary, for example managing the same sets of input files or persisting the same state for each task.File
instances, but exposes inefficiencies in the input file analysis code.File
.The changes are here: https://github.com/gradle/gradle/compare/42958134b46b814ac54a740b784dcd67afae282c...6ae577fa149b030904c8a978c632bb1c100bb0ff
Some results from a build with 20 projects, 100 source files and 50 header files per project on a Linux machine:
Scenario | master | 4.4 | 4.3 | 4.2 |
---|---|---|---|---|
ABI change :assemble |
3.40s | 3.55s | 3.54s | 3.24s |
no change :assemble |
0.42s | 0.36s | 0.37s | 0.33s |
clean :assemble |
55.97s | 58.91s | 58.77s | 53.74s |
non-ABI change :assemble |
1.97s | 2.11s | 2.10s | 1.70s |
This test build does not use any macro includes to separate that issue out.
The changes bring us closer to 4.2 performance, and there are further improvements that can be made to avoid probing for header files many times per build and more efficiently deal with large or many include file dependency graphs.
When this is improved, ping https://github.com/gradle/gradle-native/issues/502
I started experimenting with changes. My assessment so far shows that the includes analysis seems to be a big part of the bottleneck. Each time we encounter a macro includes or unresolved macro includes, we aren't caching the result. Since the project contains multiple time a very similar include graph, we will reanalyze the file to get mostly the same result. I think we could still cache those results given we can base the cache on the visible macros. Basically, we would be saying based on a set of X macros, if we analyze file Y, we should always have result Z. For the file without macro includes, we can base the caching on the file: given file Y we should always have result Z.
The visited file details are also not shared between tasks and projects. We could look at sharing those around.
A quick POC of the previous ~show an 88% decrease in visitFile
sample according to JFR~ (the number of iterations didn't match between both run so the sampling is useless) and a 69% decrease in wall clock compilation time (4.2.1: 512 ms vs. nightly: 14561 ms vs. poc: 4503 ms).
The next couple updates will try to decompose those changes into more measurable pieces.
Sharing the include analysis cache POC brings the wall clock build time to 13852 ms vs. 14816 ms with the nightly (here is the flame graph:
Caching the include analysis details according to the visible macros POC brings the wall clock build time to 3538 ms vs. 13865 ms with the nightly (here is the new flame graph.
I forgot to force macros collection before caching the result, so it's not as impressive, the wall clock time of the previous POC is 5085ms. I'm sure we can improve the implementation as no effort was made to make it efficient.
The next bottleneck seems to be during the resolution of macros, specifically around preprecessor libraries such as Boost.Preprocessor. I dumbly tried arranging the macro lookup as a hash table. However, this slowed down the build process by a factor of about 10.
I will try to see if we can create the macro hash table during parsing instead of during the analysis.
@lacasseio Is it worth reviewing what you already have so we can get that into master ASAP?
I'm getting the number for what I just commented on. Spoiler alert, it's another win! I will polish that and get everything ready for review.
By creating a hash table during parsing of the macros and macro functions, we get the following numbers. We are now faster on a clean assemble
than 4.2.1 but still a bit slower than 4.2.1 for the assemble
scenario (poc: 1617ms vs 4.2.1: 633ms).
I think it would be good to stop further experiments and get whatever improvements we've already discovered into master, rather than piling experiments on top of each other, to lock in some gains. We can then keep repeating the process (measure, spike one change, measure, implement, merge, goto 1) until we're happy with performance.
Thanks @adammurdoch for the word of wisdom of splitting the change into measurable units. I did just that and I created PR https://github.com/gradle/gradle/pull/4616 containing the most impactful change.
PR https://github.com/gradle/gradle/pull/4617 is based on the previous PR and shows the additional changes that allow Gradle to come within ~500ms of the Gradle 4.2.1 performance.
There's a bunch of new test coverage for C++ incremental compile here: https://github.com/gradle/gradle/pull/4652
These can be used to make sure any caching we do isn't making assumptions it shouldn't.
I closed the unmerged PR as my experimentation so far have only produced incorrect result or a regression.
Instead of looking at caching more result, I decided to look at the root cause of why some headers are flagged with HasMacroInclude
. Disclaimer, improving these following scenario may have no effect on performance as it only takes one bad header to invalidate an entire chain of includes which prevent them to be cached. Here is my finding sorted by the most common to least common:
These are headers that are flag as having macros because happens to include a problematic header down the chain.
These are headers contain macro includes like this:
#ifdef BOOST_HAS_ABI_HEADERS
# include BOOST_ABI_PREFIX
#endif
//...
#ifdef BOOST_HAS_ABI_HEADERS
# include BOOST_ABI_SUFFIX
#endif
This pattern is usually used when developers want to dynamically configure the namespace of the code in question. In this case, the headers contain templates and are most likely used some kind of metaprogramming.
These are quite tricky as they heavily rely on the preprocessor behavior used during the compilation:
# define BOOST_PP_ITERATION_PARAMS_1 (3,(0,BOOST_FUNCTION_MAX_ARGS,<boost/function/detail/function_iterate.hpp>))
# include BOOST_PP_ITERATE()
# undef BOOST_PP_ITERATION_PARAMS_1
These are headers try to be smarter than the build tool by creating a level of indirection to fool the analyzer:
// Since this preprocessor path is almost never taken, we hide these header
// dependencies so that build tools don't find them.
//
#define BOOST_REGEX_H1 <boost/thread/once.hpp>
#define BOOST_REGEX_H2 <boost/thread/recursive_mutex.hpp>
#define BOOST_REGEX_H3 <boost/thread/lock_types.hpp>
#include BOOST_REGEX_H1
#include BOOST_REGEX_H2
#include BOOST_REGEX_H3
#undef BOOST_REGEX_H1
#undef BOOST_REGEX_H2
#undef BOOST_REGEX_H3
These are headers including a dynamic header only if they are defined:
#if defined(BOOST_ASIO_CUSTOM_HANDLER_TRACKING)
# include BOOST_ASIO_CUSTOM_HANDLER_TRACKING
#elif defined(BOOST_ASIO_ENABLE_HANDLER_TRACKING)
# include <boost/system/error_code.hpp>
# include <boost/asio/detail/cstdint.hpp>
# include <boost/asio/detail/static_mutex.hpp>
# include <boost/asio/detail/tss_ptr.hpp>
#endif // defined(BOOST_ASIO_ENABLE_HANDLER_TRACKING)
These are headers part of a preprocessor headers library. See Boost.Preprocessor.
These are headers that can optionally override the included headers via a macro. If the override macro isn't defined, then a default include is used declared locally:
# ifndef BOOST_REGEX_USER_CONFIG
# define BOOST_REGEX_USER_CONFIG <boost/regex/user.hpp>
# endif
# include BOOST_REGEX_USER_CONFIG
Doesn't seems to be too common. Only a single unique hit in our sample project.
#if BOOST_WORKAROUND(__IBMCPP__, BOOST_TESTED_AT(700))
# define AUX778076_INCLUDE_STRING BOOST_PP_STRINGIZE(boost/mpl/aux_/preprocessed/AUX778076_PREPROCESSED_HEADER)
# include AUX778076_INCLUDE_STRING
# undef AUX778076_INCLUDE_STRING
#else
# include BOOST_PP_STRINGIZE(boost/mpl/aux_/preprocessed/AUX778076_PREPROCESSED_HEADER)
#endif
I'm unsure how these are handled with our resolver code. Some configuration headers use conditional to include platform headers and some may be absent on certain platform:
// Clang / libc++ detection.
#if defined(__clang__)
# if (__cplusplus >= 201103)
# if __has_include(<__config>)
# include <__config>
# if defined(_LIBCPP_VERSION)
# define BOOST_ASIO_HAS_CLANG_LIBCXX 1
# endif // defined(_LIBCPP_VERSION)
# endif // __has_include(<__config>)
# endif // (__cplusplus >= 201103)
#endif // defined(__clang__)
// Android platform detection.
#if defined(__ANDROID__)
# include <android/api-level.h>
#endif // defined(__ANDROID__)
The performance for 4.2.1 with the updated code on https://github.com/lacasseio/boost-asio is
I think __has_include
needs to be included with this. It doesn't seem to support incremental compilation either, and since its standardized its extremely awkward to not support it.
@ThadHouse Is what you are seeing Gradle snapshotting all the header files and an info log saying it's reverting to taking all the headers into account?
Split out of #279 as an additional piece of work. The performance of the C/C++ dependency analysis is crucial to a good developer experience. It is invoked whenever some source or header file changes and is used not just to determine what to recompile but also which build cache entries to use.
There is plenty of low hanging fruit in this area and we should invest in making this faster.