llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.1k stars 11.6k forks source link

[libc++] Extreme preprocessed size of core headers (`vector`, `string`, etc.) #80196

Open wjakob opened 7 months ago

wjakob commented 7 months ago

Dear libc++ team,

I'm really concerned about growth in the preprocessed file size of core STL header files. For example, preprocessing a 1-liner like the following with -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES

#include <vector>

expands to 2 077 167 bytes (~2 megabytes)! Without the transitive include define, it is even larger at 2 467 776 (~2.4 megabytes). The problem with this is the cost that it imposes on every compilation unit for using something as simple as a std::vector.

Of those two megabytes of preprocessed code, only a tiny portion is ultimately concerned with std::vector.

Here are some of the things that get pulled in, for unclear reasons:

In file included from test.cpp:1:
In file included from /usr/include/c++/v1/vector:321:
In file included from /usr/include/c++/v1/__format/formatter_bool.h:17:
In file included from /usr/include/c++/v1/__format/concepts.h:17:
In file included from /usr/include/c++/v1/__format/format_parse_context.h:16:
In file included from /usr/include/c++/v1/string_view:246:
In file included from /usr/include/c++/v1/compare:145:
In file included from /usr/include/c++/v1/__compare/compare_partial_order_fallback.h:13:
In file included from /usr/include/c++/v1/__compare/partial_order.h:14:
In file included from /usr/include/c++/v1/__compare/weak_order.h:14:
In file included from /usr/include/c++/v1/__compare/strong_order.h:20:
In file included from /usr/include/c++/v1/cmath....

So using vector pulls in string (why), which pulls in __compare/strong_order.h, which pulls in the entire C & C++ math library, which is huge.

Major parts of tuple, locale, atomic, mutex, ctime, typeinfo, memory are also included as well.

Much of this growth seems related to C++20 features that have started to affect C++17 builds. For example all of the code in partial_order, weak_order, strong_order, etc. is actually #ifdef-ed out when compiling in C++17 mode. But that is only true for the body part. All of the dependent header files are still included. Here is an example from __compare/strong_order.h -basically all of the body code is disabled by #if _LIBCPP_STD_VER >= 20, but the #include directives at the top aren't.

//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef _LIBCPP___COMPARE_STRONG_ORDER
#define _LIBCPP___COMPARE_STRONG_ORDER

#include <__bit/bit_cast.h>
#include <__compare/compare_three_way.h>
#include <__compare/ordering.h>
#include <__config>
#include <__type_traits/conditional.h>
#include <__type_traits/decay.h>
#include <__utility/forward.h>
#include <__utility/priority_tag.h>
#include <cmath>
#include <cstdint>
#include <limits>

#ifndef _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER
#  pragma GCC system_header
#endif

_LIBCPP_PUSH_MACROS
#include <__undef_macros>

_LIBCPP_BEGIN_NAMESPACE_STD

#if _LIBCPP_STD_VER >= 20

This issue is a plea for some attention for this issue. Would it make sense to benchmark libc++ header growth for central header files needed by many projects and start to work more actively against regressions? For projects that want to avoid the C++20+ header file growth, it would be great to trim these to what is truly needed in C++11, 14, or 17 builds.

Here are a few more examples, again all compiled in C++17 mode: Header Preprocessed size
vector 2'046'574 bytes
iostream 1'906'216 bytes
algorithm 1'163'944 bytes
string 1'056'382 bytes
list 1'021'302 bytes
forward_list 1'003'060 bytes
iterator 965'992 bytes
array 739'743 bytes
variant 769'429 bytes
tuple 495'462 bytes
utility 427'718 bytes
philnik777 commented 7 months ago

We are aware that the header size is increasing continuously, and we are working on reducing them - that's the whole point of _LIBCPP_REMOVE_TRANSITIVE_INCLUDES. Feel free to analyze the dependencies and submit patches to reduce the size. Including headers based on standards version is a non-starter though, since our dependency graphs are already quite complex and doing it on a per-version basis would just result in bugs. FWIW avoiding <math.h> is WIP. I don't know exactly why <vector> has to include formatting features. @mordante should be able to explain that.

This isn't a huge priority though, since it takes quite a lot of time to work out where we can avoid includes, and we already have a lot to do with actually implementing new features.

frederick-vs-ja commented 7 months ago

I don't know exactly why <vector> has to include formatting features. @mordante should be able to explain that.

This is needed for formatter<vector<bool>::reference> (which is currently required to be provided in <vector> as per [vector.syn]).

I tried to extract a small part of <format> into an internal header for formatter<vector<bool>::reference> in MSVC STL (microsoft/STL#4133), which currently works. But I'm not sure whether this is still fine when the whole P2286R8 + P2585R1 get implemented.

wjakob commented 7 months ago

Thank you for the quick response. What I take away from these messages is that:

I am spread too thin with my existing open source commitments to take on libc++ as an extra responsibility. But I must say that getting this under control seems more important than any shiny new C++23 feature could ever be. My hope with this issue was to draw attention to this problem.

philnik777 commented 7 months ago

Thank you for the quick response. What I take away from these messages is that:

* C++11-C++17 projects should expect to pay an implicit compile-time price for the presence of new features at the C++20 and C++23 level.

* Fixing this issue is hard because of complex inter-header dependencies and not a priority of the libc++ project. Rather, given the emphasis on adding new features, header file sizes are likely to grow in the future. No tracking of header file sizes (e.g. as part of a CI) is done at the moment.

I wouldn't even know how to track header file sizes in the CI without having a bunch of noise all the time.

I am spread too thin with my existing open source commitments to take on libc++ as an extra responsibility. But I must say that getting this under control seems more important than any shiny new C++23 feature could ever be. My hope with this issue was to draw attention to this problem.

Most of the contributions are from volunteers, and they tend to care more about new features they want than include times. We also expect people to move to newer standards, so improving include times for C++11-C++17 is quite a high cost to not solve the problem in the end. People also complained when switching from C++14 to C++17, but nobody stepped up to try to improve things. Also, somewhat related: If you have problems with include times you should consider enabling clang modules (https://clang.llvm.org/docs/Modules.html).

philnik777 commented 7 months ago

Ok, I didn't realize how bad it was (the numbers posted don't really mean much to me). The last time I checked, including <vector> was ~200ms (just from my memory, so probably not super exact), and now it's 468ms on my system. This is a significant regression, and indeed very problematic. #80282 reduces the include time to 367ms. Not what it was before, but a significant step towards old include times.

ldionne commented 7 months ago

@wjakob Out of curiosity, have you come across headers other than <vector> that had such unreasonable sizes? We are aware of the <vector> -> <format> issue and the problem is that we can't remove that dependency while being standards conforming. IMO this dependency is simply a huge design failure at the WG21 level, but we discussed it and there was no desire to make a change.

I just spoke with @philnik777 and I think we may have found a solution that would both really improve this issue while being easy to maintain and unlikely to cause bugs. Roughly speaking, we currently have three types of headers in the library:

  1. Implementation-detail headers like <__memory_resource/polymorphic_allocator.h>
  2. Public headers that contain an actual implementation, like <streambuf>, <vector> or <string>
  3. Umbrella headers, which are public headers that only include other implementation-detail headers (e.g. <functional> and <algorithm>)

From a header that contains actual code (categories 1 and 2 above), we don't want to conditionally include dependencies because that's unmaintainable and will lead to bugs. However, conditionally including implementation-detail headers from umbrella headers is really straightforward. For example, <algorithm> could really look like

/*
    algorithm synopsis
    ...
*/

#include <__config>
#include <version>

#include <__algorithm/adjacent_find.h>
#include <__algorithm/all_of.h>
#include <__algorithm/any_of.h>
...
#include <__algorithm/unwrap_iter.h>
#include <__algorithm/upper_bound.h>

#if _LIBCPP_STD_VER >= 20
#  include <__algorithm/ranges_adjacent_find.h>
...
#  include <__algorithm/ranges_upper_bound.h>
#endif

If we did the same for e.g. <format>, we'd basically end up with the <format> header being empty in C++ < 20. I think this would basically make this issue moot, and it's definitely feasible.

We may find issues with this approach, but it's worth trying. In particular, I think this will cause a massive change in our transitive includes, which could be downright prohibitive for adoption reasons, but we should at least try it out with a patch on a large codebase to see if it's a possibility.

wjakob commented 7 months ago

@idionne I've run a few more. GCC with libstdc++ does a lot better on many of these, with output roughly 3-5x smaller.

Header Preprocessed size
vector 2'046'574 bytes
iostream 1'906'216 bytes
algorithm 1'163'944 bytes
string 1'056'382 bytes
list 1'021'302 bytes
forward_list 1'003'060 bytes
iterator 965'992 bytes
array 739'743 bytes
variant 769'429 bytes
tuple 495'462 bytes
utility 427'718 bytes

By the way, sorting is maybe not the best way to approach this problem. So for example the fact that algorithm and iostream are large is perhaps not too surprising. But vector, string, tuple, utility (should just be std::pair and a few minor other things like std::move) are I feel are unreasonably large what are supposed to be really simple containers and core components that almost no C++ project can do without.

@philnik777 Might I suggest that preprocessed file size is exactly the thing that should be tracked? The commits you referenced specify milliseconds compilation time, which is a completely arbitrary and difficult-to reproduce quantity. Whereas preprocessed file size refers to the raw source code size that has all the comments etc. stripped out. It is directly proportional to the number of tokens representing the thing that actually causes work for the lexer and parser. This will have much less noise than other quantities, and you can nicely plot it over time. Nikita Popov does something conceptually similar by tracking the number of CPU instructions retired in user space, instead of the raw time which would be noisy. You can see an example here, including his commentary on a significant drop in compilation performance caused by C++17-related additions: https://www.npopov.com/2022/12/20/This-year-in-LLVM-2022.html. In any case, tracking preprocessed file size isolates the libc++-specific header size concern from other changes in LLVM/Clang itself that might have an effect on raw compilation performance.

This is how I run these benchmarks:

clang++-17 -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES test.cpp -stdlib=libc++ -std=c++17 -E -o- | wc -c
philnik777 commented 7 months ago

@philnik777 Might I suggest that preprocessed file size is exactly the thing that should be tracked? The commits you referenced specify milliseconds compilation time, which is a completely arbitrary and difficult-to reproduce quantity. Whereas preprocessed file size refers to the raw source code size that has all the comments etc. stripped out. It is directly proportional to the number of tokens representing the thing that actually causes work for the lexer and parser. This will have much less noise than other quantities, and you can nicely plot it over time. Nikita Popov does something conceptually similar by tracking the number of CPU instructions retired in user space, instead of the raw time which would be noisy. You can see an example here, including his commentary on a significant drop in compilation performance caused by C++17-related additions: https://www.npopov.com/2022/12/20/This-year-in-LLVM-2022.html. In any case, tracking preprocessed file size isolates the libc++-specific header size concern from other changes in LLVM/Clang itself that might have an effect on raw compilation performance.

This is how I run these benchmarks:

clang++-17 -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES test.cpp -stdlib=libc++ -std=c++17 -E -o- | wc -c

Your metrics are actually quite arbitrary too. This includes stuff like line information that doesn't take a significant time to process. e.g. a file // is 149 bytes when preprocessed on godbolt: https://godbolt.org/z/adE614cPn. When doing that in my test folder locally I get 204 bytes. To make that a lot less problematic, you have to add -P to your command line arguments. With that you only get the newline that should have been there: https://godbolt.org/z/h4TEfv1Gj. Another problem is that we have dependencies, especially on the libc. With every system having a different version of the libc and different platforms having entirely different ones, it can have a very significant impact on actual PP size without us being able to change that. But, most importantly: how would you want to track it? You can't just save the number somewhere, since that would result in noise on every single patch and would make it impossible to land multiple patches in parallel.

wjakob commented 7 months ago

I'm not suggesting that this would be a benchmark to run on the user's end. It could be done just like here for compile time: https://llvm-compile-time-tracker.com/graphs.php?startDate=2021-12-01&interval=100&relative=on

In other words: on a reference machine that runs clang++ -E (plus other useful flags, e.g. -P that you suggested) every day with the main git branch to create a byte size plot with time on the x axis. There could be a plot for per header considered optimization-worthy. Perhaps this would be easy to do with the tooling that @nikic already has in place.

mordante commented 7 months ago

@wjakob Out of curiosity, have you come across headers other than <vector> that had such unreasonable sizes? We are aware of the <vector> -> <format> issue and the problem is that we can't remove that dependency while being standards conforming. IMO this dependency is simply a huge design failure at the WG21 level, but we discussed it and there was no desire to make a change.

I wouldn't call it a design failure. WG21 has created a solution for the growing header sizes; modules. So I'm not too surprised header size is not considered a huge issue. (I'm well aware of the status of modules in implementations.)

I just spoke with @philnik777 and I think we may have found a solution that would both really improve this issue while being easy to maintain and unlikely to cause bugs. Roughly speaking, we currently have three types of headers in the library:

1. Implementation-detail headers like `<__memory_resource/polymorphic_allocator.h>`

2. Public headers that contain an actual implementation, like `<streambuf>`, `<vector>` or `<string>`

3. Umbrella headers, which are public headers that only include other implementation-detail headers (e.g. `<functional>` and `<algorithm>`)

From a header that contains actual code (categories 1 and 2 above), we don't want to conditionally include dependencies because that's unmaintainable and will lead to bugs.

Care to elaborate why that would be unmaintainable? I think that would work. I've created a patch with that approach a while ago https://reviews.llvm.org/D157298