mosra / corrade

C++11 multiplatform utility library
https://magnum.graphics/corrade/
Other
486 stars 107 forks source link

Optimizing compilation time for the test suite -- an analysis #167

Open vittorioromeo opened 1 year ago

vittorioromeo commented 1 year ago

I wanted to play around with corrade and magnum, but got sidetracked and ended up doing a compilation benchmark of corrade (with tests enabled) using ClangBuildAnalyzer for fun. These are the results:

ClangBuildAnalyzer output ``` Analyzing build trace from 'out.txt'... **** Time summary: Compilation (547 times): Parsing (frontend): 238.8 s Codegen & opts (backend): 38.0 s **** Files that took longest to parse (compiler frontend): 3557 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityJsonTest.dir/JsonTest.cpp.obj 2953 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityAlgorithmsTest.dir/AlgorithmsTest.cpp.obj 2912 ms: src/Corrade/Containers/Test/CMakeFiles/ContainersStridedArrayViewTest.dir/StridedArrayViewTest.cpp.obj 2894 ms: src/Corrade/Utility/CMakeFiles/corrade-rc.dir/Arguments.cpp.obj 2759 ms: src/Corrade/Utility/CMakeFiles/CorradeUtilityTestLib.dir/Arguments.cpp.obj 2756 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityPathTest.dir/PathTest.cpp.obj 2625 ms: src/Corrade/Utility/CMakeFiles/CorradeUtility.dir/Path.cpp.obj 2625 ms: src/Corrade/Utility/CMakeFiles/CorradeUtilityTestLib.dir/Path.cpp.obj 2605 ms: src/Corrade/TestSuite/CMakeFiles/CorradeTestSuiteObjects.dir/Tester.cpp.obj 2565 ms: src/Corrade/Utility/CMakeFiles/CorradeUtility.dir/Arguments.cpp.obj **** Files that took longest to codegen (compiler backend): 1686 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityJsonTest.dir/JsonTest.cpp.obj 1319 ms: src/Corrade/Containers/Test/CMakeFiles/ContainersGrowableArrayTest.dir/GrowableArrayTest.cpp.obj 1138 ms: src/Corrade/Containers/Test/CMakeFiles/ContainersStridedArrayViewTest.dir/StridedArrayViewTest.cpp.obj 1034 ms: src/Corrade/Containers/Test/CMakeFiles/ContainersStringTest.dir/StringTest.cpp.obj 917 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityPathTest.dir/PathTest.cpp.obj 804 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityAlgorithmsTest.dir/AlgorithmsTest.cpp.obj 723 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityArgumentsTest.dir/ArgumentsTest.cpp.obj 712 ms: src/Corrade/Containers/Test/CMakeFiles/ContainersStridedBitArrayViewTest.dir/StridedBitArrayViewTest.cpp.obj 698 ms: src/Corrade/PluginManager/Test/CMakeFiles/PluginManagerManagerTest.dir/ManagerTest.cpp.obj 666 ms: src/Corrade/Utility/Test/CMakeFiles/UtilityDirectoryTest.dir/DirectoryTest.cpp.obj **** Templates that took longest to instantiate: 997 ms: Corrade::TestSuite::Tester::compareWith (101 times, avg 9 ms) 844 ms: Corrade::TestSuite::Implementation::printMessage>::printMessage (73 times, avg 11 ms) 834 ms: Corrade::TestSuite::Tester::compareWith (80 times, avg 10 ms) 831 ms: Corrade::Containers::BasicStringView::BasicStringView (249 times, avg 3 ms) 809 ms: Corrade::TestSuite::Implementation::printMessage::printMessage (82 times, avg 9 ms) 726 ms: Corrade::TestSuite::Tester::compareWith, wchar_t> (138 times, avg 3 ms) 521 ms: Corrade::TestSuite::Tester::compare (60 times, avg 8 ms) 516 ms: Corrade::TestSuite::Tester::compareAs (60 times, avg 8 ms) 455 ms: Corrade::TestSuite::Tester::compare (42 times, avg 10 ms) 428 ms: Corrade::TestSuite::Tester::compareAs (42 times, avg 10 ms) 422 ms: std::pair (145 times, avg 2 ms) 421 ms: std::basic_string (138 times, avg 3 ms) 416 ms: Corrade::TestSuite::Tester::compareWith (32 times, avg 13 ms) 413 ms: Corrade::TestSuite::Tester::compareWith (249 times, avg 1 ms) 404 ms: Corrade::TestSuite::Tester::compareWith, std... (40 times, avg 10 ms) 396 ms: std::basic_stringbuf::str (92 times, avg 4 ms) 391 ms: std::basic_string::basic_string (138 times, avg 2 ms) 391 ms: Corrade::Containers::BasicStringView::slice (45 times, avg 8 ms) 390 ms: Corrade::TestSuite::Tester::compareWith (138 times, avg 2 ms) 374 ms: Corrade::TestSuite::Tester::compareAs, char> (138 times, avg 2 ms) 364 ms: Corrade::TestSuite::Tester::compareAs (138 times, avg 2 ms) 343 ms: std::basic_ostringstream::str (76 times, avg 4 ms) **** Template sets that took longest to instantiate: 11310 ms: Corrade::TestSuite::Tester::compareAs<$> (1377 times, avg 8 ms) 10927 ms: Corrade::TestSuite::Tester::compareWith<$> (1336 times, avg 8 ms) 8234 ms: Corrade::TestSuite::Comparator<$>::printMessage (542 times, avg 15 ms) 8023 ms: Corrade::TestSuite::Implementation::printMessage<$> (532 times, avg 15 ms) 7766 ms: Corrade::TestSuite::Tester::compare<$> (1163 times, avg 6 ms) 1449 ms: std::basic_string<$> (552 times, avg 2 ms) 1445 ms: std::__and_<$> (1112 times, avg 1 ms) 1205 ms: __gnu_cxx::__stoa<$> (1693 times, avg 0 ms) 1036 ms: Corrade::Containers::BasicStringView<$>::BasicStringView (292 times, avg 3 ms) 968 ms: std::is_convertible<$> (609 times, avg 1 ms) 937 ms: Corrade::Utility::Implementation::HasOstreamOutput<$> (1282 times, avg 0 ms) 891 ms: __gnu_cxx::__to_xstring<$> (276 times, avg 3 ms) 750 ms: std::pair<$> (263 times, avg 2 ms) 734 ms: Corrade::Containers::BasicStringView<$> (498 times, avg 1 ms) 696 ms: std::decay<$> (300 times, avg 2 ms) 694 ms: Corrade::Containers::StridedArrayView<$>::StridedArrayView (76 times, avg 9 ms) 628 ms: std::basic_string<$>::basic_string<$> (287 times, avg 2 ms) 582 ms: std::__or_<$> (498 times, avg 1 ms) 580 ms: Corrade::Containers::StridedArrayView<$> (324 times, avg 1 ms) 572 ms: std::basic_string<$>::_M_construct<$> (411 times, avg 1 ms) 563 ms: Corrade::Containers::arrayCast<$> (104 times, avg 5 ms) 557 ms: Corrade::Containers::arrayAppend<$> (100 times, avg 5 ms) 556 ms: Corrade::Containers::BasicStringView<$>::slice (62 times, avg 8 ms) 487 ms: Corrade::TestSuite::Implementation::diagnosticSaver<$> (438 times, avg 1 ms) 475 ms: std::__is_convertible_helper<$> (239 times, avg 1 ms) 448 ms: std::is_constructible<$> (237 times, avg 1 ms) 427 ms: Corrade::Containers::Implementation::arrayGrowBy<$> (77 times, avg 5 ms) 408 ms: Corrade::Utility::Implementation::HasMemberBegin<$> (104 times, avg 3 ms) 400 ms: Corrade::TestSuite::Comparator<$>::saveDiagnostic (415 times, avg 0 ms) 396 ms: std::basic_stringbuf<$>::str (92 times, avg 4 ms) **** Functions that took longest to compile: 98 ms: Corrade::TestSuite::Tester::exec(Corrade::TestSuite::Tester*, std::o... (C:/OHWorkspace/corrade/src/Corrade/TestSuite/Tester.cpp) 96 ms: Corrade::Utility::Arguments::tryParse(int, char const* const*) (C:/OHWorkspace/corrade/src/Corrade/Utility/Arguments.cpp) 95 ms: Corrade::Utility::Test::(anonymous namespace)::AssertTest::AssertTes... (C:/OHWorkspace/corrade/src/Corrade/Utility/Test/AssertTest.cpp) 56 ms: Corrade::Utility::Arguments::help[abi:cxx11]() const (C:/OHWorkspace/corrade/src/Corrade/Utility/Arguments.cpp) 44 ms: Corrade::Utility::Test::(anonymous namespace)::JsonTest::nested() (C:/OHWorkspace/corrade/src/Corrade/Utility/Test/JsonTest.cpp) 39 ms: Corrade::Utility::Unicode::utf32(std::__cxx11::basic_string,... (C:/OHWorkspace/corrade/src/Corrade/Utility/Test/DebugTest.cpp) 22 ms: Corrade::Utility::Test::(anonymous namespace)::ConfigurationTest::eo... (C:/OHWorkspace/corrade/src/Corrade/Utility/Test/ConfigurationTest.cpp) 22 ms: Corrade::Containers::Test::(anonymous namespace)::ArrayTupleTest::co... (C:/OHWorkspace/corrade/src/Corrade/Containers/Test/ArrayTupleTest.cpp) **** Function sets that took longest to compile / optimize: 391 ms: Corrade::TestSuite::Comparator<$>::printMessage(Corrade::Containers:... (411 times, avg 0 ms) 273 ms: Corrade::TestSuite::Comparator<$>::saveDiagnostic(Corrade::Container... (405 times, avg 0 ms) 136 ms: Corrade::TestSuite::Comparator<$>::printMessage(Corrade::Containers:... (82 times, avg 1 ms) 95 ms: Corrade::Containers::BasicStringView<$>::BasicStringView(char const*... (61 times, avg 1 ms) 85 ms: Corrade::Containers::BasicStringView::BasicStringView(ch... (61 times, avg 1 ms) 63 ms: __gnu_cxx::__enable_if<__is_char::__value, bool>::__type std::... (71 times, avg 0 ms) 60 ms: Corrade::Containers::BasicStringView::slice(unsigned lon... (17 times, avg 3 ms) 51 ms: Corrade::Containers::ArrayView<$>::ArrayView<$>(Corrade::Containers:... (76 times, avg 0 ms) 49 ms: Corrade::Utility::TweakableParser<$>::parse(Corrade::Containers::Bas... (11 times, avg 4 ms) 48 ms: Corrade::Containers::StridedArrayView<$> Corrade::Containers::Implem... (14 times, avg 3 ms) 46 ms: Corrade::Containers::BasicStringView::BasicStringView(ch... (58 times, avg 0 ms) 42 ms: void Corrade::TestSuite::Tester::addRepeatedTests<$>(std::initialize... (63 times, avg 0 ms) 40 ms: void Corrade::Containers::Test::(anonymous namespace)::GrowableArray... (6 times, avg 6 ms) 40 ms: Corrade::Containers::BasicStringView::BasicStringView(char*, u... (29 times, avg 1 ms) 39 ms: Corrade::Utility::Unicode::utf32(std::__cxx11::basic_string<$> const&) (1 times, avg 39 ms) 38 ms: void Corrade::Containers::Test::(anonymous namespace)::GrowableArray... (8 times, avg 4 ms) 37 ms: std::vector<$>::vector(std::vector<$> const&) (5 times, avg 7 ms) 37 ms: Corrade::Containers::Array<$>::~Array() (45 times, avg 0 ms) 36 ms: std::_Vector_base<$>::~_Vector_base() (47 times, avg 0 ms) 36 ms: Corrade::PluginManager::AbstractManager::AbstractManager(Corrade::Co... (1 times, avg 36 ms) 36 ms: Corrade::Containers::BasicStringView::operator[](unsigne... (15 times, avg 2 ms) 35 ms: Corrade::Utility::Json::tokenize(Corrade::Containers::BasicStringVie... (1 times, avg 35 ms) 33 ms: void Corrade::TestSuite::Tester::compareWith<$>(Corrade::TestSuite::... (6 times, avg 5 ms) 33 ms: Corrade::Containers::StridedArrayView<$>::StridedArrayView<$>(Corrad... (33 times, avg 1 ms) 33 ms: Corrade::PluginManager::AbstractManager::loadInternal(Corrade::Plugi... (1 times, avg 33 ms) 33 ms: Corrade::Utility::Arguments::addArrayArgument(std::__cxx11::basic_st... (1 times, avg 33 ms) 32 ms: void Corrade::Containers::Test::(anonymous namespace)::StringTest::f... (2 times, avg 16 ms) 30 ms: void Corrade::Utility::Test::(anonymous namespace)::ConfigurationVal... (3 times, avg 10 ms) 29 ms: Corrade::Utility::Implementation::(anonymous namespace)::resourceCom... (2 times, avg 14 ms) 28 ms: Corrade::Containers::BasicStringView<$>::BasicStringView<$>(Corrade:... (27 times, avg 1 ms) *** Expensive headers: 19787 ms: C:/OHWorkspace/corrade/src/Corrade/Containers/StringView.h (included 249 times, avg 79 ms), included via: StringView.cpp.obj (293 ms) StringView.cpp.obj (292 ms) CppStandardTest.cpp.obj (274 ms) StringView.cpp.obj (268 ms) String.cpp.obj String.h (237 ms) CppStandardTest.cpp.obj (216 ms) ... 18668 ms: C:/msys64/mingw64/include/windows.h (included 20 times, avg 933 ms), included via: Arguments.cpp.obj (1254 ms) Arguments.cpp.obj (1156 ms) Path.cpp.obj wtypes.h rpc.h (1043 ms) WindowsWeakSymbol.cpp.obj (1016 ms) Unicode.cpp.obj (978 ms) CorradeMain.cpp.obj (938 ms) ... 18276 ms: C:/OHWorkspace/corrade/src/Corrade/Containers/String.h (included 242 times, avg 75 ms), included via: String.cpp.obj (273 ms) String.cpp.obj (258 ms) String.cpp.obj (245 ms) BundledFilesTest.cpp.obj (240 ms) FormatStlStringViewTest.cpp.obj (219 ms) ArgumentsTest.cpp.obj StringStl.h (215 ms) ... 17717 ms: C:/msys64/mingw64/include/c++/12.2.0/bits/ios_base.h (included 93 times, avg 190 ms), included via: sstream istream ios (387 ms) sstream istream ios (317 ms) iostream ostream ios (312 ms) sstream istream ios (311 ms) iomanip (309 ms) sstream istream ios (296 ms) ... 16577 ms: C:/OHWorkspace/corrade/src/Corrade/TestSuite/Tester.h (included 145 times, avg 114 ms), included via: MacrosCpp14Test.cpp.obj (334 ms) RawForwardListTest.cpp.obj (320 ms) MacrosCpp20Test.cpp.obj (302 ms) MathTest.cpp.obj (294 ms) AssertTest.cpp.obj (278 ms) MacrosCpp17Test.cpp.obj (276 ms) ... 13218 ms: C:/OHWorkspace/corrade/src/Corrade/Containers/ArrayView.h (included 175 times, avg 75 ms), included via: StaticArrayStlSpanTest.cpp.obj StaticArray.h (215 ms) StaticArrayViewStlSpanTest.cpp.obj ArrayViewStlSpan.h (191 ms) testsuite-instanced.cpp.obj (159 ms) Tweakable.cpp.obj Tweakable.h (154 ms) WrongMetadata.cpp.obj WrongMetadata.h Array.h (154 ms) ArrayViewStlSpanTest.cpp.obj ArrayViewStlSpan.h (153 ms) ... 13218 ms: C:/OHWorkspace/corrade/src/Corrade/Containers/StringStl.h (included 225 times, avg 58 ms), included via: ArgumentsTest.cpp.obj (216 ms) resource_ResourceTestData.cpp.obj Resource.h (211 ms) Resource.cpp.obj Resource.h (192 ms) resource_ResourceTestNothingData.cpp.obj Resource.h (179 ms) resource_ResourceTestNullTerminatedLastFileData.cpp.obj Resource.h (177 ms) Unicode.cpp.obj Unicode.h (157 ms) ... 13165 ms: C:/msys64/mingw64/include/c++/12.2.0/bits/locale_classes.h (included 93 times, avg 141 ms), included via: sstream istream ios ios_base.h (338 ms) sstream istream ios ios_base.h (269 ms) sstream istream ios ios_base.h (264 ms) iostream ostream ios ios_base.h (249 ms) sstream istream ios ios_base.h (231 ms) iostream ostream ios ios_base.h (229 ms) ... 13039 ms: C:/OHWorkspace/corrade/src/Corrade/Utility/Debug.h (included 269 times, avg 48 ms), included via: Debug.cpp.obj (165 ms) Debug.cpp.obj (164 ms) MathTest.cpp.obj Tester.h Pointer.h DebugAssert.h Assert.h (136 ms) MacrosCpp17Test.cpp.obj Tester.h Pointer.h DebugAssert.h Assert.h (131 ms) CppStandardTest.cpp.obj StringView.h DebugAssert.h Assert.h (119 ms) LibraryTest.cpp.obj EmitterLibrary.h Emitter.h Connection.h Reference.h (116 ms) ... 9515 ms: C:/OHWorkspace/corrade/src/Corrade/Containers/Pointer.h (included 192 times, avg 49 ms), included via: MacrosCpp20Test.cpp.obj Tester.h (209 ms) HotDog.cpp.obj AbstractFood.h AbstractPlugin.h (197 ms) MacrosCpp17Test.cpp.obj Tester.h (186 ms) MathTest.cpp.obj Tester.h (184 ms) RawForwardListTest.cpp.obj Tester.h (166 ms) Bulldog.cpp.obj AbstractAnimal.h AbstractPlugin.h (165 ms) ... done in 0.1s. ```

Notable things:

  1. There are some very expensive instantiations in the Tester class. I tried reducing some of them by controlling instantiation explicitly, but it didn't do much. The whole CORRADE_COMPARE_AS mechanism seems to be very heavy on compilation time, I wonder if it can be redesigned to be "nicer".

  2. There are some commonly used heavy headers in corrade, including Tester.h, StringView.h, and Pointer.h. I enabled PCH via CMake after some pain, but surprisingly it didn't make much of a difference. I am not sure why, but probably worth investigating. Precompiling those headers would be a massive time save.

  3. There are a bunch of inclusions of windows.h without MEAN_AND_LEAN defined. Not sure if intentional.

  4. There are some std::* utilities and traits that could be replaced with a manual version to avoid instantiation. Even better, variable templates could be used since C++14 instead of classes, which are faster in every major compiler.

Unfortunately I don't have anything concrete, but I hope that the analysis helps!

mosra commented 1 year ago

Hi, thank you, useful analysis!

Sidetracked with compilation times, haha, I know that feeling very well :)

  1. I have some work-in-progress changes for the TestSuite macros in #140. Got some results there but nothing significant enough to make that patch worth merging alone, still need to look into that more. OTOH the test code makes up maybe 90% of the codebase, the test files are heavy on their own, often being several thousands lines long with the CORRADE_COMPARE* macros used on almost every second line, so those showing up high in the profiler output isn't unexpected (I'd be surprised if those didn't show up high).

    Unfortunately I don't have a comparison to how gtest or other popular testing frameworks would behave at the same scale -- maybe I'm completely fine compared to those -- nevertheless any improvement is welcome, rebuild time of the 6700-line GltfImporterTest.cpp is almost 5 seconds and is enough to get me impatient :D

  2. Question is whether those headers actually are heavy, or if it's just that they're used a lot -- they don't contain that much on their own (look at Pointer.h, there isn't much left to remove, apart from micro-optimizing some type traits), and all headers are parsed in about 50-80 ms which I think is fine. Which is also probably why PCHs didn't help you that much -- there just isn't any widely used header that would be heavy on its own for PCHs to make a difference.

    As I'm aware the most significant overhead comes from STL <sstream> usage, which is used in almost all tests for historical reasons, and which I didn't get to replacing yet because it involves reinventing/replacing many wheels, including float printing (tracking issue here: https://github.com/mosra/magnum/issues/293).

    Finally, not sure if you had a debug or release build, but on a release build a lot of debug-only assertions from headers is compiled out, which has a considerable impact. Oh, and the on-by-default CORRADE_BUILD_DEPRECATED option also currently adds quite a few compatibility STL includes in many places, including Tester.h, to aid people with porting from APIs that used std::string to the new StringView. Disabiling it can make quite a difference -- if you had it enabled before, can you try again with it disabled?

  3. Thanks! I'll cherry-pick those from your commit. I'm on Linux, so I have to admit I don't have an immediate feedback when windows.h accidentally escapes the cage somewhere.

  4. Yup, I reached a similar conclusion recently. Can't really make C++14 a minimum yet (need GCC 4.8 compatibility for certain users), but maybe I could opt into variable templates if C++14 is available... or use compiler builtins instead. But there are downsides, I don't feel like maintaining my own variant of <type_traits> with all its compiler-specific magic :)

    Possibly related is that I use std::enable_if "wrong" in quite a few places, by having it in the function signature and thus participate in overload resolution. Need to clean that up to be on the return type instead.

vittorioromeo commented 1 year ago

@mosra: been playing around with some possible improvements here -- https://github.com/vittorioromeo/corrade/tree/wip_type_traits

I do get a minor compilation time speedup on my machine, I wonder if it's measurable on yours (or on CI).

mosra commented 1 year ago

Hey, that's great, thank you! :)

I have to think about the compilation time vs maintenance overhead (e.g., "what all do I need to change if a new compiler would implement a builtin of the same name but with different semantics", or the amount of additional test coverage that would be needed for each of these, or how hard will then be for a third party to contribute anything to this codebase given not even the standard type traits would be used anymore), but quite a lot of these seem quite simple and free of compiler magic.

Something similar is done in bgfx, and with success, so that's probably a path forward for me as well. Once I fix the bigger fish, like <sstream> :sweat_smile: