microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.02k stars 1.47k forks source link

`/std:c++latest` makes headers much slower to include, up to 10 times #3599

Open Kojoley opened 1 year ago

Kojoley commented 1 year ago

Times are reported by compiler frontend /Bt flag, >5ms >1% diff shown, min of 10 runs, /permissive- was used on every /std to exclude preprocessor/parser differences:

default c++17 c++20 c++latest slowdown header
0.121 0.275 1.248 1.308 981% <chrono>
0.033 0.037 0.182 0.186 464% <cmath>
0.166 0.193 0.291 0.537 223% <queue>
0.137 0.162 0.210 0.438 220% <stack>
..... 0.581 1.394 1.474 154% <filesystem>
0.079 0.124 0.169 0.174 120% <numeric>
0.053 0.061 0.111 0.114 115% <utility>
0.081 0.092 0.169 0.172 112% <array>
0.062 0.070 0.124 0.128 106% <tuple>
0.056 0.064 0.114 0.114 104% <typeindex>
0.143 0.159 0.263 0.287 101% <algorithm>
..... 0.397 0.559 0.758 91% <execution>
0.149 0.214 0.269 0.284 91% <functional>
0.147 0.159 0.249 0.259 76% <memory>
0.171 0.209 0.281 0.291 70% <stdexcept>
0.172 0.210 0.280 0.292 70% <bitset>
0.188 0.227 0.302 0.312 66% <string>
0.199 0.239 0.320 0.330 66% <system_error>
0.391 0.450 0.611 0.645 65% <regex>
0.109 0.120 0.174 0.178 63% <iterator>
0.192 0.204 0.303 0.312 62% <thread>
0.138 0.160 0.209 0.223 62% <list>
0.318 0.378 0.498 0.510 60% <sstream>
0.406 0.458 0.618 0.646 59% <random>
0.148 0.165 0.226 0.235 59% <valarray>
0.259 0.298 0.397 0.411 59% <streambuf>
0.287 0.332 0.442 0.455 59% <ostream>
0.282 0.325 0.438 0.447 59% <ios>
0.531 0.604 0.801 0.841 58% <future>
0.292 0.339 0.449 0.461 58% <iostream>
0.134 0.158 0.205 0.211 57% <deque>
0.135 0.157 0.205 0.212 57% <forward_list>
0.271 0.304 0.414 0.425 57% <mutex>
0.298 0.342 0.454 0.465 56% <locale>
0.140 0.166 0.212 0.218 56% <vector>
0.307 0.353 0.465 0.478 56% <strstream>
0.301 0.342 0.459 0.468 55% <fstream>
0.279 0.310 0.423 0.433 55% <shared_mutex>
0.304 0.349 0.456 0.470 55% <codecvt>
0.279 0.307 0.418 0.431 54% <condition_variable>
0.307 0.371 0.464 0.471 53% <iomanip>
..... 0.166 0.246 0.252 52% <charconv>
0.145 0.171 0.210 0.220 52% <set>
0.334 0.383 0.494 0.505 51% <complex>
0.137 0.153 0.200 0.206 50% <scoped_allocator>
0.175 0.199 0.254 0.263 50% <unordered_map>
0.308 0.355 0.452 0.460 49% <istream>
0.176 0.198 0.255 0.262 49% <unordered_set>
0.148 0.165 0.212 0.218 47% <map>
..... 0.150 0.201 0.215 43% <any>
..... 0.342 0.473 0.490 43% <memory_resource>
..... 0.149 0.206 0.213 43% <optional>
..... 0.206 0.279 0.293 42% <string_view>
..... 0.213 0.273 0.277 30% <variant>
0.072 0.080 0.090 0.090 25% <atomic>
0.053 0.062 0.066 0.066 25% <new>
0.048 0.055 0.058 0.059 23% <type_traits>
0.054 0.062 0.065 0.065 20% <exception>
0.056 0.063 0.067 0.067 20% <typeinfo>
0.052 0.059 0.062 0.062 19% <ratio>
..... ..... 0.385 0.418 9% <ranges>
..... ..... 0.826 0.867 5% <format>
..... ..... 0.229 0.237 3% <stop_token>
..... ..... 0.514 0.527 3% <syncstream>
..... ..... 0.221 0.226 2% <barrier>
>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.35.32216.1 for x64

Repro:

import subprocess
import json
import os
import sys
import colorama
from colorama import Fore, Back, Style
from collections import defaultdict
colorama.init()

def parse_msvc_wall(output):
    i = output.find(b'c1xx.dll)=')
    if i == -1: return None
    j = output.find(b's', i)
    return float(output[i+10:j])

def msvc_get_parsing_time(*fnames, std=None):
    if len(fnames) == 1 and isinstance(fnames, list):
        fnames = fnames[0]

    cmd = ['cl', '/nologo', '/Bt', '/Zs', '/TP', '/w', '.empty.tmp']

    if std is not None:
        cmd.append(f'/std:c++{std}')

    cmd += [f'/FI{fn[1:-1]}' for fn in fnames]
    #print(' '.join(cmd))

    try:
        output = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True).stdout
    except subprocess.CalledProcessError as e:
        if b'C1083' in e.stdout:
            return None
        time = parse_msvc_wall(e.stderr)
        if time is not None:
            return time
        print(f'stderr={e.stderr}')
        print(f'stderr={e.stdout}')
        raise

    return parse_msvc_wall(output)

# https://eel.is/c++draft/headers
headers = set('''
<algorithm>
<flat_set>
<mutex>
<stdexcept>
<any>
<format>
<new>
<stdfloat>
<array>
<forward_list>
<numbers>
<stop_token>
<atomic>
<fstream>
<numeric>
<streambuf>
<barrier>
<functional>
<optional>
<string>
<bit>
<future>
<ostream>
<string_view>
<bitset>
<generator>
<print>
<strstream>
<charconv>
<initializer_list>
<queue>
<syncstream>
<chrono>
<iomanip>
<random>
<system_error>
<codecvt>
<ios>
<ranges>
<thread>
🔗
<compare>
<iosfwd>
<ratio>
<tuple>
🔗
<complex>
<iostream>
<regex>
<type_traits>
🔗
<concepts>
<istream>
<scoped_allocator>
<typeindex>
🔗
<condition_variable>
<iterator>
<semaphore>
<typeinfo>
🔗
<coroutine>
<latch>
<set>
<unordered_map>
🔗
<deque>
<limits>
<shared_mutex>
<unordered_set>
🔗
<exception>
<list>
<source_location>
<utility>
🔗
<execution>
<locale>
<span>
<valarray>
🔗
<expected>
<map>
<spanstream>
<variant>
🔗
<filesystem>
<mdspan>
<sstream>
<vector>
🔗
<flat_map>
<memory>
<stack>
<version>
🔗
<memory_resource>
<stacktrace>
<cassert>
<cfenv>
<climits>
<csetjmp>
<cstddef>
<cstdlib>
<cuchar>
🔗
<cctype>
<cfloat>
<clocale>
<csignal>
<cstdint>
<cstring>
<cwchar>
🔗
<cerrno>
<cinttypes>
<cmath>
<cstdarg>
<cstdio>
<ctime>
<cwctype>
'''.strip().splitlines()) - {'🔗'}
#headers = ['<string_view>', '<chrono>']
max_header_name_len = max(map(len, headers))

def time_to_color(t):
    if t < 0.010: return Fore.BLACK, Style.BRIGHT
    if t < 0.020: return Fore.WHITE, Style.DIM
    if t < 0.040: return '', ''
    if t < 0.060: return Fore.CYAN, Style.BRIGHT
    if t < 0.080: return Fore.WHITE, Style.BRIGHT
    if t < 0.100: return Fore.YELLOW, Style.BRIGHT
    if t < 0.150: return Fore.RED, Style.BRIGHT
    if t < 0.200: return Fore.RED, Style.DIM
    if t < 0.300: return Fore.MAGENTA, Style.BRIGHT
    return Fore.MAGENTA, Style.DIM

reset_colors = Style.RESET_ALL + Fore.RESET

stds = [None, '17', '20', 'latest']

def ttc(time):
    f, s = time_to_color(time)
    return f + s

def info(timings, expected_count=len(stds), reverse=False, delim=' ', fill=' .....'):
    a = [f'{ttc(time)}{time:>6.3f}{reset_colors}' for time in timings]
    b = [fill] * (expected_count - len(timings))
    return delim.join(b + a if reverse else a + b )

def print_slowdown(all_timings, considered_disabled=0.030, min_diff=0.005, min_diff_rel=0.01):
    s = ' | '.join(f'c++{std}' if std else 'default' for std in stds)
    print(f'{s} | slowdown | header')
    print(('-' * 6 + ':|') * (len(stds) + 1) + '-' * max_header_name_len)

    results = []
    for header, timings in all_timings.items():
        timings = timings
        while len(timings) and timings[0] < considered_disabled:
            timings = timings[1:]

        if len(timings) == 0: continue

        slowdown = timings[-1] / timings[0] - 1
        if abs(timings[-1] - timings[0]) > min_diff and abs(slowdown) > min_diff_rel:
            results.append((slowdown, header, timings))

    for slowdown, header, timings in sorted(results, reverse=True):
        print(f'{info(timings, reverse=True, delim=" |")} | {slowdown:>5.0%} | `{header}`')

def get_timings():
    with open('.empty.tmp', 'w+') as f:
        pass
    all_timings = defaultdict(list)
    print(f'Timing standard library headers:')
    for header in headers:
        #print(f'working on {header}...', end='')
        print(f'{info([])} {header}', end='')

        for std in stds:
            timings = []
            for _ in range(10):
                time = msvc_get_parsing_time(header, std=std)
                if time is None:
                    timings.append(float('nan'))
                    break
                timings.append(time)

            time = min(timings)
            f, s = time_to_color(time)
            self_colors = f + s
            all_timings[header].append(time)
            print(f'\r{info(all_timings[header])} {header}', end='')
        print(f'\r{info(all_timings[header])} {header}')
    return all_timings

if __name__ == "__main__":
    fn = 'bench_syshdrs.json'
    if os.path.exists(fn) and not '-r' in sys.argv:
        with open(fn) as f:
            timings = json.load(f)
    else:
        timings = get_timings()
        with open(fn, 'w+') as f:
            json.dump(timings, f)
        print('\n' * 3)
    print_slowdown(timings)
StephanTLavavej commented 1 year ago

Thanks for the report and analysis.

It's a difficult problem because C++ keeps adding features to existing headers. In some cases we can improve throughput by reducing header dependencies (fighting against the natural tendency of every header to want to include every other header), but it's a lot of work for often little reward.

Ultimately this is a moot point in the world of Standard Library Modules. We may simply need to conclude, "Want throughput? Use modules." That said, refactorings to improve throughput are welcome as long as they aren't deeply invasive or widely source-breaking. (Every such refactoring has the potential to break source where users assumed X would always drag in Y; we can accept some amount of breakage in every release but it is a cost for us and users.)

frederick-vs-ja commented 1 year ago

There's already a separated issue for <chrono> (#2003). Unfortunately, it's quite difficult to optimize <chrono>, because it's difficult to conformingly implement some operator<< overloads without std::format. Note that even if <chrono> gets rid of <format>, it is still significantly heavier in C++20.

Optimization of other headers seems easier to me.

Kojoley commented 1 year ago

Thanks for the report and analysis.

It's a difficult problem because C++ keeps adding features to existing headers. In some cases we can improve throughput by reducing header dependencies (fighting against the natural tendency of every header to want to include every other header), but it's a lot of work for often little reward.

I totally understand, I'm the author of #355 :-), and I also for years trying to untangle Boost and make it include less, even though those changes are not always welcomed or sometimes simply undone few commits down the line. But on the other side: changing the standard headers parsing speed affects most of C++ users. Is it a small 'reward'? If you look at a single compile, but the downstream impact is substantial. Probably there are people who uses their huge build times as an excuse, but there are definitely people who cares about compile times and so much so that they do these:

From my experience: Boost.Spirit CI times are doubled on 14.3 compared to 14.0 (tripled for X3 which does not use precompiled headers). And this is considering that the main reason of huge Spirit V2 compile times is overzealous instantiation, frontend to backend emission, and backend inlining of that stuff.

Ultimately this is a moot point in the world of Standard Library Modules. We may simply need to conclude, "Want throughput? Use modules."

I have tried to use modules. It was like VS doesn't want me to use them, passionately. First I just googled what modules standard library has and the standard itself seems to not mandate these names? Right away I found the tutorial https://learn.microsoft.com/en-us/cpp/cpp/tutorial-import-stl-named-module?view=msvc-170 which says: The feature is subject to change between preview releases. You shouldn't use preview features in production code.

I continued. The tutorial talks about import std and std.compat:

>cl /std:c++latest /Zs import.std.cpp
error C2230: could not find module 'std'
>cl /std:c++latest /Zs import.std.compat.cpp
error C2230: could not find module 'std.compat'

Okay, googling more, I found the other list of names here https://learn.microsoft.com/en-us/cpp/cpp/modules-cpp?view=msvc-170#consume-c-standard-library-as-modules-experimental, it is import std.core:

>cl /std:c++latest /Zs import.std.core.cpp
fatal error C1011: cannot locate standard module interface. Did you install the library part of the C++ modules feature in VS setup?

What, I should've installed them separately? Ehmm... Went to the installer, it warns me:

image

Trying again:

>cl /std:c++latest /Zs import.std.core.cpp
fatal error C1011: cannot locate standard module interface. Did you install the library part of the C++ modules feature in VS setup?

Ugh, what else do I need? Finally found the /experimental:module flag and it worked, though once again I got a welcoming note:

>cl /std:c++latest /Zs import.std.core.cpp /experimental:module
Experimental features are provided as a preview of proposed language features,
and we're eager to hear about bugs and suggestions for improvements. However,
note that these experimental features are non-standard, provided as-is without
support, and subject to breaking changes or removal without notice. See
http://go.microsoft.com/fwlink/?LinkID=691081 for details.

Btw, I'm not sure that the link http://go.microsoft.com/fwlink/?LinkID=691081 is pointing to the right location, looks like a landing page?

Kojoley commented 1 year ago

I updated the table because I forgot about C headers. Now top 2 is <cmath>.

StephanTLavavej commented 1 year ago

@Kojoley

I have tried to use modules. It was like VS doesn't want me to use them, passionately.

It's a rough experience right now, sorry about that. While the compiler and library support is present in VS 2022 17.5, there are significant compiler bugs (tracked by #1694 where they affect the STL, and they should be significantly improved in 17.6), and the end-to-end story (including the build system, and IntelliSense) is incomplete (again, improvements scheduled for 17.6 and beyond).

First I just googled what modules standard library has and the standard itself seems to not mandate these names?

C++23 provides the std and std.compat modules. See WG21-N4944 16.4.2.4 [std.modules]. That is the exhaustive list (ignoring header units).

The feature is subject to change between preview releases. You shouldn't use preview features in production code.

That is technically true for all /std:c++latest features, although in the STL we try to ship everything at production quality anyways. (The main caveat is that we do reserve the right to break ABI until the implementation is complete and the /std:c++23 switch has been added.) Modules are unusually affected by compiler issues so the experience is much rougher and gradually improving than the usual "not supported => supported flawlessly" transition.

error C2230: could not find module 'std'

You must build the shipped std.ixx file to produce std.ifc and std.obj. We do not (and will not) ship prebuilt artifacts for the Standard Library Modules, as they depend on all of the chosen compiler options.

std and std.compat require only /std:c++latest. They do not require /experimental:module.

Okay, googling more, I found the other list of names here https://learn.microsoft.com/en-us/cpp/cpp/modules-cpp?view=msvc-170#consume-c-standard-library-as-modules-experimental, it is import std.core:

This confusion is our fault. The std.core etc. modules are an early experiment. They are non-Standard, they required /experimental:module and installation as an optional component in the VS Installer, and they should be considered as superseded by std and std.compat.


Here's the proper Hello World for modules (until automatic build system support is available):

C:\Temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od /c "%VCToolsInstallDir%\modules\std.ixx"
std.ixx

C:\Temp>dir std.*
[...]
03/30/2023  10:21 PM        32,565,163 std.ifc
03/30/2023  10:21 PM         3,183,735 std.obj
[...]

C:\Temp>type meow.cpp
import std;

int main() {
    std::cout << "Hello, modules";
    std::printf(" world!\n");
}
C:\Temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od meow.cpp std.obj
meow.cpp

C:\Temp>meow
Hello, modules world!
Kojoley commented 1 year ago
C:\Temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od /c "%VCToolsInstallDir%\modules\std.ixx"
C:\Temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od meow.cpp std.obj

So with modules the Hello World needs TWO very compiler specific command lines to build? Cannot import std embed information that std.obj will be needed? That looks like the progress going backwards :-(

We do not (and will not) ship prebuilt artifacts for the Standard Library Modules, as they depend on all of the chosen compiler options.

Even on warning flags? There is no way to selectively disable false-positive warnings form standard library in sources which encounter them and has to be globally disabled with modular std (because I guess it is UB to mix 'different' standard modules)?

Sorry for abusing the thread.

StephanTLavavej commented 1 year ago

It's your own issue and this is tangentially related, so it's fine :joy_cat: but you may want to join the STL Discord's channel if you want to discuss modules more. (There's also the Discussions tab in this repo, although I'll admit we don't check it frequently, whereas the Discord is highly active.)

So with modules the Hello World needs TWO very compiler specific command lines to build?

Modules fundamentally change how we consume libraries, so the sequence of compiler incantations needs to change too.

Cannot import std embed information that std.obj will be needed?

Other implementations may do things differently, but in MSVC we're trying to avoid having the compiler become a build system. The compiler has enough to worry about already. So from a strictly compiler perspective, it needs to be told to build the Standard Library Modules before consuming them.

Now, the build system (MSBuild or CMake/Ninja) will be able to automate this, since it will be informed by the toolset of the location of std.ixx and std.compat.ixx and their dependency relationship. Once that work is in place, most users won't need to worry about this, the same way that they don't need to worry about separate commands to build and consume PCHes. (While precompiled headers are a very different technology from modules, they are superficially similar in that they require the generation of artifacts that need to be consumed later.)

The good news is that building the entire Standard Library as a module is extremely fast (I measure it taking 3-5 seconds) and it can be reused until the compiler options change, or the toolset itself is upgraded.

Even on warning flags?

Changing compiler options between module production and consumption is fraught with peril, since you're asking the compiler to persist its understanding of C++ library sources with one set of switches, and then reload it with a different set. We can't support that in general (e.g. changing Standard modes or anything truly invasive). Maybe you can get away with it for warning options, but maybe not - warnings can rely on a delicate dance between the compiler front-end and back-end. I would strongly discourage attempting to do this.

There is no way to selectively disable false-positive warnings form standard library in sources which encounter them and has to be globally disabled with modular std (because I guess it is UB to mix 'different' standard modules)?

The ODR rules for modules are essentially the same as for classic headers.

What's really bad, as I mentioned immediately above, is attempting to change options between generating std.ifc/std.obj and consuming it. That would be equivalent to changing options between #include <vector> and int user_func() {} which is currently not possible. (If you attempt to do that with PCHes, the compiler will/should absolutely reject that - PCHes are compiler memory dumps and cannot possibly be robust to such an attempt.)

If you built different flavors of std.ifc/std.obj (with different options) to be consumed by separate user1.cpp and user2.cpp, and then linked them together, that is essentially equivalent to building user1.cpp and user2.cpp with different options and classic includes - in both cases, you're building the Standard Library sources independently with different options, then asking the linker to smash them together. In this case, varying warning options is fairly safe. It's less efficient to build, though, since you have to build the modules twice. (Classic includes build the library separately for every TU.) Avoiding that would be good.

For targeted warning suppressions, you can often pragma push-disable-pop around the instantiation of a library template. Sometimes you can vary the types you're giving to the library (e.g. to avoid sign/truncation warnings). And we provide extensive facilities to globally suppress warnings in library code only, without also suppressing them in user code; that could be applied when building and consuming the modules, just as with classic headers (this is one of the reasons why we need std.ixx to be built on the user machine).

Kojoley commented 1 year ago

Thanks for the detailed answer, I'm really borrowing your time, but I cannot not comment on it.

Modules fundamentally change how we consume libraries, so the sequence of compiler incantations needs to change too.

Yeah... we now need a build system to even compile Hello World =/ What previously was as simple as -Ipath/to/library with modules requires to use a build system and expect from the library support of that build system.

Other implementations may do things differently, but in MSVC we're trying to avoid having the compiler become a build system. The compiler has enough to worry about already. So from a strictly compiler perspective, it needs to be told to build the Standard Library Modules before consuming them.

The time shown again and again that such things are much easier solved by toolchain themselves and not by build systems. From top of my head: put standard library headers path to include search paths(!), linking the standard library(!), linking sanitizer libraries, linking atomic library, linking filesystem library, linking pthread, #pragma comment(lib, "Ws2_32.lib"), and most likely I missed some other basic things that the toolchain does for us.

The good news is that building the entire Standard Library as a module is extremely fast (I measure it taking 3-5 seconds) and it can be reused until the compiler options change, or the toolset itself is upgraded.

It is for now, but two standard releases further it might get much worse :-)

mode default c++17 c++20 c++latest
#include every std header 0.643 0.992 2.216 2.329

The ODR rules for modules are essentially the same as for classic headers.

I cannot imagine how I would know for sure what flags a third party library built the standard library with.

It's less efficient to build, though, since you have to build the modules twice. (Classic includes build the library separately for every TU.) Avoiding that would be good.

That probably another headache build systems need to solve because otherwise it will link two different std.obj and get a duplicates symbols linker error.

Kojoley commented 1 year ago

I forgot that /std:c++20 and /std:c++latest enables /permissive- which switches preprocessor and parser to different implementations, and it turns out that for example the half of slowdown of <variant> can be attributed to the new preprocessor/parser.

default /permissive- slowdown header
0.269 0.391 45% <regex>
0.224 0.308 38% <istream>
0.225 0.307 36% <iomanip>
0.223 0.304 36% <codecvt>
0.122 0.166 36% <queue>
0.130 0.176 35% <unordered_set>
0.236 0.318 35% <sstream>
0.222 0.298 34% <locale>
0.131 0.175 34% <unordered_map>
0.304 0.406 34% <random>
0.220 0.292 33% <iostream>
0.252 0.334 33% <complex>
0.233 0.307 32% <strstream>
0.212 0.279 32% <condition_variable>
0.219 0.287 31% <ostream>
0.111 0.145 31% <set>
0.214 0.279 30% <shared_mutex>
0.149 0.192 29% <thread>
0.219 0.282 29% <ios>
0.414 0.531 28% <future>
0.112 0.143 28% <algorithm>
0.117 0.148 26% <valarray>
0.206 0.259 26% <streambuf>
0.109 0.137 26% <stack>
0.117 0.147 26% <memory>
0.216 0.271 25% <mutex>
0.110 0.138 25% <list>
0.118 0.148 25% <map>
0.121 0.149 23% <functional>
0.109 0.134 23% <deque>
0.154 0.188 22% <string>
0.111 0.135 22% <forward_list>
0.248 0.301 21% <fstream>
0.090 0.109 21% <iterator>
0.067 0.081 21% <array>
0.142 0.171 20% <stdexcept>
0.066 0.079 20% <numeric>
0.144 0.172 19% <bitset>
0.167 0.199 19% <system_error>
0.118 0.140 19% <vector>
0.119 0.137 15% <scoped_allocator>
0.064 0.072 12% <atomic>
0.056 0.062 11% <tuple>
0.110 0.121 10% <chrono>
0.049 0.042 -14% <cwchar>
0.039 0.031 -21% <cstdio>
0.025 0.019 -24% <cstring>
0.029 0.022 -24% <cstdlib>
0.022 0.015 -32% <initializer_list>
0.022 0.015 -32% <ctime>
0.022 0.015 -32% <cstddef>
0.018 0.012 -33% <climits>
0.021 0.014 -33% <cinttypes>
0.021 0.014 -33% <cfenv>
0.021 0.014 -33% <cctype>
0.020 0.013 -35% <cuchar>
0.020 0.013 -35% <csignal>
0.020 0.013 -35% <clocale>
0.020 0.013 -35% <cfloat>
0.020 0.013 -35% <cassert>
0.021 0.013 -38% <cwctype>
0.021 0.013 -38% <cerrno>
0.018 0.011 -39% <cstdarg>
0.019 0.011 -42% <version>
0.019 0.011 -42% <cstdint>
0.019 0.011 -42% <csetjmp>
/std:c++17 + /permissive- slowdown header
0.139 0.213 53% <variant>
0.259 0.371 43% <iomanip>
0.318 0.450 42% <regex>
0.281 0.397 41% <execution>
0.272 0.378 39% <sstream>
0.259 0.355 37% <istream>
0.141 0.193 37% <queue>
0.157 0.214 36% <functional>
0.257 0.349 36% <codecvt>
0.146 0.198 36% <unordered_set>
0.147 0.199 35% <unordered_map>
0.340 0.458 35% <random>
0.128 0.171 34% <set>
0.254 0.339 33% <iostream>
0.257 0.342 33% <locale>
0.288 0.383 33% <complex>
0.268 0.353 32% <strstream>
0.253 0.332 31% <ostream>
0.239 0.310 30% <shared_mutex>
0.251 0.325 29% <ios>
0.468 0.604 29% <future>
0.238 0.307 29% <condition_variable>
0.451 0.581 29% <filesystem>
0.126 0.162 29% <stack>
0.159 0.204 28% <thread>
0.124 0.159 28% <algorithm>
0.129 0.165 28% <map>
0.126 0.160 27% <list>
0.130 0.165 27% <valarray>
0.125 0.158 26% <deque>
0.241 0.304 26% <mutex>
0.237 0.298 26% <streambuf>
0.127 0.159 25% <memory>
0.165 0.206 25% <string_view>
0.074 0.092 24% <array>
0.183 0.227 24% <string>
0.134 0.166 24% <charconv>
0.194 0.239 23% <system_error>
0.121 0.149 23% <optional>
0.170 0.209 23% <stdexcept>
0.128 0.157 23% <forward_list>
0.225 0.275 22% <chrono>
0.099 0.120 21% <iterator>
0.124 0.150 21% <any>
0.174 0.210 21% <bitset>
0.138 0.166 20% <vector>
0.285 0.342 20% <fstream>
0.104 0.124 19% <numeric>
0.293 0.342 17% <memory_resource>
0.133 0.153 15% <scoped_allocator>
0.071 0.080 13% <atomic>
0.063 0.070 11% <tuple>
0.050 0.043 -14% <cwchar>
0.039 0.033 -15% <cstdio>
0.029 0.023 -21% <cstdlib>
0.022 0.016 -27% <cstddef>
0.027 0.019 -30% <cstring>
0.023 0.016 -30% <initializer_list>
0.022 0.015 -32% <ctime>
0.020 0.013 -35% <cuchar>
0.020 0.013 -35% <csignal>
0.020 0.013 -35% <clocale>
0.020 0.013 -35% <cfloat>
0.022 0.014 -36% <cfenv>
0.022 0.014 -36% <cctype>
0.021 0.013 -38% <cwctype>
0.021 0.013 -38% <cinttypes>
0.021 0.013 -38% <cerrno>
0.018 0.011 -39% <cstdarg>
0.020 0.012 -40% <cassert>
0.019 0.011 -42% <version>
0.019 0.011 -42% <cstdint>
0.019 0.011 -42% <csetjmp>
0.019 0.011 -42% <climits>
default c++17 c++20 c++latest slowdown header
0.110 0.225 1.276 1.339 1117% <chrono>
0.121 0.275 1.248 1.308 981% <chrono> /permissive-
0.033 0.037 0.182 0.186 464% <cmath> /permissive-
0.037 0.040 0.179 0.183 395% <cmath>
0.122 0.141 0.288 0.521 327% <queue>
0.109 0.126 0.202 0.421 286% <stack>
..... 0.451 1.404 1.471 226% <filesystem>
0.166 0.193 0.291 0.537 223% <queue> /permissive-
0.137 0.162 0.210 0.438 220% <stack> /permissive-
0.066 0.104 0.167 0.171 159% <numeric>
0.067 0.074 0.166 0.170 154% <array>
..... 0.581 1.394 1.474 154% <filesystem> /permissive-
0.112 0.124 0.261 0.283 153% <algorithm>
..... 0.281 0.545 0.708 152% <execution>
0.269 0.318 0.599 0.633 135% <regex>
0.121 0.157 0.265 0.279 131% <functional>
0.049 0.056 0.110 0.112 129% <utility>
0.056 0.063 0.123 0.126 125% <tuple>
0.079 0.124 0.169 0.174 120% <numeric> /permissive-
0.117 0.127 0.245 0.253 116% <memory>
0.053 0.061 0.111 0.114 115% <utility> /permissive-
0.081 0.092 0.169 0.172 112% <array> /permissive-
0.055 0.062 0.114 0.116 111% <typeindex>
0.304 0.340 0.608 0.639 110% <random>
0.144 0.174 0.288 0.301 109% <bitset>
0.225 0.259 0.456 0.467 108% <iomanip>
0.223 0.257 0.450 0.461 107% <codecvt>
0.062 0.070 0.124 0.128 106% <tuple> /permissive-
0.222 0.257 0.445 0.457 106% <locale>
0.142 0.170 0.280 0.292 106% <stdexcept>
..... 0.139 0.275 0.285 105% <variant>
0.220 0.254 0.439 0.451 105% <iostream>
0.224 0.259 0.448 0.459 105% <istream>
0.149 0.159 0.298 0.305 105% <thread>
0.219 0.253 0.433 0.447 104% <ostream>
0.056 0.064 0.114 0.114 104% <typeindex> /permissive-
0.219 0.251 0.428 0.445 103% <ios>
0.236 0.272 0.463 0.475 101% <sstream>
0.143 0.159 0.263 0.287 101% <algorithm> /permissive-
0.233 0.268 0.453 0.467 100% <strstream>
0.212 0.238 0.411 0.423 100% <condition_variable>
0.414 0.468 0.783 0.825 99% <future>
0.154 0.183 0.295 0.306 99% <string>
0.117 0.130 0.224 0.232 98% <valarray>
0.167 0.194 0.317 0.331 98% <system_error>
0.214 0.239 0.414 0.424 98% <shared_mutex>
0.252 0.288 0.486 0.497 97% <complex>
0.131 0.147 0.248 0.258 97% <unordered_map>
0.130 0.146 0.247 0.256 97% <unordered_set>
0.216 0.241 0.411 0.422 95% <mutex>
0.206 0.237 0.390 0.402 95% <streambuf>
0.090 0.099 0.170 0.174 93% <iterator>
0.111 0.128 0.207 0.213 92% <set>
..... 0.397 0.559 0.758 91% <execution> /permissive-
0.149 0.214 0.269 0.284 91% <functional> /permissive-
0.110 0.126 0.201 0.209 90% <list>
0.109 0.125 0.200 0.207 90% <deque>
0.118 0.138 0.219 0.224 90% <vector>
0.111 0.128 0.203 0.210 89% <forward_list>
0.248 0.285 0.452 0.463 87% <fstream>
..... 0.134 0.243 0.247 84% <charconv>
..... 0.124 0.208 0.228 84% <any>
0.118 0.129 0.208 0.216 83% <map>
0.119 0.133 0.200 0.214 80% <scoped_allocator>
0.147 0.159 0.249 0.259 76% <memory> /permissive-
..... 0.165 0.273 0.285 73% <string_view>
..... 0.121 0.202 0.208 72% <optional>
0.171 0.209 0.281 0.291 70% <stdexcept> /permissive-
0.172 0.210 0.280 0.292 70% <bitset> /permissive-
0.188 0.227 0.302 0.312 66% <string> /permissive-
0.199 0.239 0.320 0.330 66% <system_error> /permissive-
0.391 0.450 0.611 0.645 65% <regex> /permissive-
0.109 0.120 0.174 0.178 63% <iterator> /permissive-
0.192 0.204 0.303 0.312 62% <thread> /permissive-
..... 0.293 0.461 0.476 62% <memory_resource>
0.138 0.160 0.209 0.223 62% <list> /permissive-
0.318 0.378 0.498 0.510 60% <sstream> /permissive-
0.406 0.458 0.618 0.646 59% <random> /permissive-
0.148 0.165 0.226 0.235 59% <valarray> /permissive-
0.259 0.298 0.397 0.411 59% <streambuf> /permissive-
0.287 0.332 0.442 0.455 59% <ostream> /permissive-
0.282 0.325 0.438 0.447 59% <ios> /permissive-
0.531 0.604 0.801 0.841 58% <future> /permissive-
0.292 0.339 0.449 0.461 58% <iostream> /permissive-
0.134 0.158 0.205 0.211 57% <deque> /permissive-
0.135 0.157 0.205 0.212 57% <forward_list> /permissive-
0.271 0.304 0.414 0.425 57% <mutex> /permissive-
0.298 0.342 0.454 0.465 56% <locale> /permissive-
0.140 0.166 0.212 0.218 56% <vector> /permissive-
0.307 0.353 0.465 0.478 56% <strstream> /permissive-
0.301 0.342 0.459 0.468 55% <fstream> /permissive-
0.279 0.310 0.423 0.433 55% <shared_mutex> /permissive-
0.304 0.349 0.456 0.470 55% <codecvt> /permissive-
0.279 0.307 0.418 0.431 54% <condition_variable> /permissive-
0.307 0.371 0.464 0.471 53% <iomanip> /permissive-
..... 0.166 0.246 0.252 52% <charconv> /permissive-
0.145 0.171 0.210 0.220 52% <set> /permissive-
0.334 0.383 0.494 0.505 51% <complex> /permissive-
0.137 0.153 0.200 0.206 50% <scoped_allocator> /permissive-
0.175 0.199 0.254 0.263 50% <unordered_map> /permissive-
0.308 0.355 0.452 0.460 49% <istream> /permissive-
0.176 0.198 0.255 0.262 49% <unordered_set> /permissive-
0.148 0.165 0.212 0.218 47% <map> /permissive-
..... 0.150 0.201 0.215 43% <any> /permissive-
..... 0.342 0.473 0.490 43% <memory_resource> /permissive-
..... 0.149 0.206 0.213 43% <optional> /permissive-
..... 0.206 0.279 0.293 42% <string_view> /permissive-
0.064 0.071 0.089 0.089 39% <atomic>
0.049 0.058 0.062 0.064 31% <ratio>
..... 0.213 0.273 0.277 30% <variant> /permissive-
0.046 0.052 0.058 0.058 26% <type_traits>
0.052 0.058 0.065 0.065 25% <new>
0.072 0.080 0.090 0.090 25% <atomic> /permissive-
0.053 0.062 0.066 0.066 25% <new> /permissive-
0.052 0.059 0.065 0.064 23% <exception>
0.048 0.055 0.058 0.059 23% <type_traits> /permissive-
0.054 0.060 0.066 0.066 22% <typeinfo>
0.054 0.062 0.065 0.065 20% <exception> /permissive-
0.056 0.063 0.067 0.067 20% <typeinfo> /permissive-
0.052 0.059 0.062 0.062 19% <ratio> /permissive-
..... ..... 0.385 0.418 9% <ranges> /permissive-
..... ..... 0.382 0.411 8% <ranges>
..... ..... 0.826 0.867 5% <format> /permissive-
..... ..... 0.816 0.853 5% <format>
..... ..... 0.229 0.237 3% <stop_token> /permissive-
..... ..... 0.225 0.232 3% <stop_token>
..... ..... 0.166 0.171 3% <span>
..... ..... 0.216 0.222 3% <barrier>
..... ..... 0.514 0.527 3% <syncstream> /permissive-
..... ..... 0.504 0.516 2% <syncstream>
..... ..... 0.221 0.226 2% <barrier> /permissive-
..... ..... 0.114 0.109 -4% <compare>
0.025 0.027 0.019 0.020 -20% <cstring>
0.029 0.029 0.023 0.023 -21% <cstdlib>
0.022 0.023 0.016 0.016 -27% <initializer_list>
0.022 0.022 0.016 0.016 -27% <cstddef>
0.022 0.022 0.015 0.015 -32% <ctime>
0.021 0.022 0.014 0.014 -33% <cfenv>
0.021 0.022 0.015 0.014 -33% <cctype>
0.020 0.020 0.013 0.013 -35% <cuchar>
0.020 0.020 0.013 0.013 -35% <csignal>
0.020 0.020 0.013 0.013 -35% <clocale>
0.020 0.020 0.013 0.013 -35% <cfloat>
0.020 0.020 0.013 0.013 -35% <cassert>
0.019 0.019 0.011 0.012 -37% <csetjmp>
0.021 0.021 0.014 0.013 -38% <cwctype>
0.021 0.021 0.013 0.013 -38% <cinttypes>
0.021 0.021 0.013 0.013 -38% <cerrno>
0.018 0.018 0.011 0.011 -39% <cstdarg>
0.018 0.019 0.011 0.011 -39% <climits>
0.019 0.019 0.011 0.011 -42% <version>
0.019 0.019 0.011 0.011 -42% <cstdint>
ADKaster commented 1 year ago

Given the impact that /permissive- had on your results, is it possible that your measurements are more indicative of MSVC compiler performance than STL complexity? Running the script through clang-cl.exe as well might help isolate the STL itself as the culprit for slowdowns. And point to whether it's quality of implementation issues in the library or possible over-inclusion (whether mandated by the standard or not).

Kojoley commented 1 year ago

Given the impact that /permissive- had on your results, is it possible that your measurements are more indicative of MSVC compiler performance than STL complexity?

In my previous message I included both results with /permissive- and without, I also updated the first post with results from /permissive- run.

Running the script through clang-cl.exe as well might help isolate the STL itself as the culprit for slowdowns. And point to whether it's quality of implementation issues in the library or possible over-inclusion (whether mandated by the standard or not).

More is included -> more has to be parsed -> slower compilation.

>clang --version
clang version 15.0.7
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
c++14 c++17 c++20 c++2b slowdown header
0.134 0.258 1.015 1.045 679% <chrono>
0.035 0.041 0.212 0.214 504% <cmath>
0.158 0.183 0.321 0.868 451% <queue>
0.139 0.159 0.225 0.748 438% <stack>
..... 0.337 0.556 1.027 205% <execution>
0.075 0.131 0.193 0.197 161% <numeric>
0.049 0.056 0.126 0.127 160% <utility>
0.078 0.087 0.192 0.196 153% <array>
0.367 0.405 0.866 0.917 150% <random>
0.057 0.064 0.137 0.140 148% <tuple>
0.053 0.061 0.127 0.128 141% <typeindex>
0.445 0.494 0.999 1.041 134% <future>
..... 0.524 1.180 1.222 133% <filesystem>
0.141 0.155 0.289 0.316 125% <algorithm>
0.151 0.203 0.291 0.307 102% <functional>
0.332 0.375 0.589 0.626 89% <regex>
0.169 0.196 0.301 0.314 86% <bitset>
0.169 0.197 0.302 0.313 85% <stdexcept>
0.149 0.158 0.264 0.274 84% <memory>
0.141 0.164 0.255 0.259 83% <vector>
0.186 0.216 0.321 0.332 79% <string>
0.197 0.225 0.338 0.348 77% <system_error>
0.116 0.126 0.201 0.205 77% <iterator>
0.270 0.305 0.447 0.461 71% <ios>
0.280 0.316 0.460 0.477 71% <iomanip>
0.271 0.304 0.448 0.462 70% <ostream>
0.274 0.309 0.454 0.464 70% <istream>
0.170 0.192 0.277 0.289 70% <unordered_map>
0.275 0.310 0.452 0.466 69% <locale>
0.273 0.307 0.452 0.463 69% <iostream>
0.140 0.161 0.228 0.236 69% <set>
0.152 0.167 0.250 0.257 69% <valarray>
0.276 0.310 0.455 0.467 69% <codecvt>
0.170 0.190 0.276 0.287 69% <unordered_set>
..... 0.170 0.282 0.287 69% <charconv>
0.249 0.279 0.405 0.419 68% <streambuf>
0.138 0.159 0.225 0.232 68% <list>
0.279 0.314 0.458 0.469 68% <fstream>
0.137 0.157 0.224 0.230 68% <forward_list>
0.138 0.158 0.223 0.231 67% <deque>
0.287 0.323 0.465 0.478 67% <strstream>
0.290 0.327 0.472 0.484 67% <sstream>
0.261 0.279 0.418 0.431 65% <condition_variable>
0.260 0.277 0.414 0.427 64% <mutex>
0.265 0.283 0.422 0.434 64% <shared_mutex>
0.196 0.202 0.311 0.320 64% <thread>
0.310 0.346 0.489 0.502 62% <complex>
0.148 0.162 0.229 0.237 60% <map>
..... 0.194 0.299 0.310 60% <string_view>
0.144 0.154 0.220 0.228 58% <scoped_allocator>
..... 0.309 0.467 0.482 56% <memory_resource>
..... 0.148 0.223 0.230 56% <any>
..... 0.152 0.227 0.236 55% <optional>
..... 0.211 0.290 0.296 40% <variant>
0.075 0.084 0.092 0.092 22% <atomic>
0.045 0.052 0.054 0.055 21% <type_traits>
0.051 0.059 0.062 0.062 21% <new>
0.047 0.054 0.057 0.057 21% <ratio>
0.053 0.060 0.063 0.063 20% <typeinfo>
0.051 0.058 0.061 0.061 19% <exception>
..... ..... 0.687 0.733 7% <ranges>
..... ..... 0.242 0.249 3% <barrier>
..... ..... 0.511 0.525 3% <syncstream>
..... ..... 0.252 0.259 3% <stop_token>
..... ..... 0.642 0.654 2% <format>
MikeGitb commented 1 year ago

Have you benchmarked compilation times of full projects with different c++-versions?

I mean, the numbers are really not great ( a full second compilation time, just because chrono is included is nuts) but would be interesting to see if/ when those differences actually matter for overall compilation times of projects and not just microbenchmarks.

E.g. my observation with most projects I've been working on was that most compilation units directly or indirectly include a large part of the stl anyway (usually stuff like Filesystem is isolated to only a few compilation unit, but memory, chrono etc. is all over the place). So if a slowdown of a individual header is because it includes another STL-header, then this wouldn't matter to me, because I most likely include both headers anyway.

On the other hand, there is probably little the maintainers can do w.r.t. slowdowns caused by new members that have been added to a particular header.

Kojoley commented 1 year ago

From my understanding a lot of headers became heavy because of this chain utility -> compare -> bit where bit is a heavy header (very unfortunately!) and the only thing is used from it is bit_cast. Any ideas whether bit could be made lighter?

Have you benchmarked compilation times of full projects with different c++-versions?

Yes, my times are doubled on 14.3 compared to 14.0, look here https://ci.appveyor.com/project/Kojoley/spirit/builds/46658394.

StephanTLavavej commented 1 year ago

<bit> doesn't appear to be inherently expensive - it just appears to be the first one dragging in <type_traits>, from a quick inspection. (Its dependency on <limits> could be removed quite easily.)

Kojoley commented 1 year ago

it just appears to be the first one dragging in <type_traits>

Hmm, <utility> itself unconditionally includes <type_traits> so it is couldn't be the source of the parse time doubling (<type_traits> itself slowed only by 3ms between c++17 and c++20). I previously prototyped a tool that analyzes self inclusion cost of headers, probably I could find it and pull more information with it.

frederick-vs-ja commented 1 year ago

From my understanding a lot of headers became heavy because of this chain utility -> compare -> bit where bit is a heavy header (very unfortunately!) and the only thing is used from it is bit_cast. Any ideas whether bit could be made lighter?

I don't think this is a major reason though... I think we can just use __builtin_bit_cast in <compare> to reduce inclusion dependency, and this perphaps won't damage mantainability a lot since bit_cast is only used 4 times now in <compare>.

Kojoley commented 1 year ago

I previously prototyped a tool that analyzes self inclusion cost of headers, probably I could find it and pull more information with it.

The tool takes inclusion graph and calculates for each edge how its removal would affect the compile time:

Output for <utility>: cost % header -> include
0.063 51% <utility> -> <compare>
0.052 43% <compare> -> <bit>
0.051 41% <bit> -> <limits>
0.041 34% <limits> -> <cwchar>
0.016 13% <xstddef> -> <cstdlib>
0.012 10% <cwchar> -> <cstdio>
0.007 6% <cstdlib> -> <math.h>
0.004 3% <wchar.h> -> <corecrt_wstring.h>

<cwchar> -> <cstdio> seems to be a known thing https://github.com/microsoft/STL/blob/9231abe46d466a0f262db234fd3bd4de1170ee45/stl/inc/cwchar#L12

Output for <chrono>: cost % header -> include
0.137 16% <chrono> -> <format>
0.086 10% <format> -> <charconv>
0.075 9% <chrono> -> <algorithm>
0.029 3% <chrono> -> <sstream>
0.025 3% <xiosbase> -> <xlocale>
0.020 2% <sstream> -> <string>
0.019 2% <chrono> -> <vector>
0.013 1% <charconv> -> <xcharconv_ryu.h>
0.007 1% <locale> -> <xlocbuf>
0.007 1% <format> -> <locale>
0.006 1% <format> -> <mutex>
0.005 1% <format> -> <__msvc_format_ucd_tables.hpp>

It shows that removing <format> from <chrono> won't make things dramatically better.

philnik777 commented 1 year ago

FWIW libc++ has worked a lot on granularizing headers, and our C++23 CI job is 3 minutes faster than our C++20 job (C++23 is ~9min), mostly because we've removed a lot of transitive includes in C++23.

Kojoley commented 1 year ago

@philnik777 IIUC it was done by splitting out headers like type_traits and bit to a lot of small single function headers? Does your CI includes Windows? Filesystem on Windows is much slower compared to *nix so overdoing splitting might end up being worse on this OS.

philnik777 commented 1 year ago

You can always run the tests against libc++ you have run here, right? If splitting up the headers is so bad on windows, it should be obvious by having much worse times than the STL. It's also quite possible that there is a sweet spot between what libc++ is doing and what you are doing here.

Kojoley commented 1 year ago

You can always run the tests against libc++ you have run here, right? If splitting up the headers is so bad on windows, it should be obvious by having much worse times than the STL. It's also quite possible that there is a sweet spot between what libc++ is doing and what you are doing here.

It is hard to make a fair comparison. It seems that there is no easy way to test libc++ without building it, and even then it also seems to not work when clang targets msvc abi. Here are results from what is shipped with MSYS2 CLANG64 using its clang version 16.0.4 Target: x86_64-w64-windows-gnu.

libc++ seems to be doing much better than MSSTL <chrono> is only 57% slower :-), though +60% for <utility> I wouldn't call a mild increase.

Same kind of a table like in the first post. clang -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES:

c++14 c++17 c++20 c++2b diff header
0.068 0.072 0.107 0.108 +60% <utility>
0.237 0.228 0.363 0.373 +57% <chrono>
0.084 0.089 0.119 0.123 +47% <tuple>
0.154 0.177 0.227 0.225 +46% <iterator>
0.191 0.194 0.272 0.275 +44% <algorithm>
0.098 0.105 0.135 0.139 +42% <scoped_allocator>
..... 0.188 0.270 0.267 +42% <string_view>
0.004 0.005 0.006 0.006 +40% <cstddef>
0.182 0.206 0.251 0.255 +40% <set>
0.119 0.122 0.165 0.166 +40% <array>
0.187 0.210 0.256 0.261 +39% <map>
0.189 0.213 0.258 0.263 +39% <unordered_set>
0.194 0.219 0.265 0.269 +39% <unordered_map>
0.004 0.006 0.006 0.006 +33% <initializer_list>
0.176 0.184 0.228 0.235 +33% <list>
0.277 0.275 0.365 0.368 +33% <string>
0.198 0.210 0.257 0.263 +33% <vector>
0.282 0.278 0.369 0.375 +33% <system_error>
0.174 0.182 0.226 0.232 +33% <forward_list>
0.297 0.294 0.386 0.394 +33% <bitset>
0.399 0.394 0.518 0.526 +32% <random>
0.071 0.074 0.092 0.093 +30% <typeindex>
0.192 0.200 0.245 0.249 +30% <deque>
0.194 0.202 0.248 0.252 +30% <stack>
0.334 0.363 0.434 0.433 +30% <functional>
0.486 0.486 0.621 0.629 +29% <regex>
0.322 0.317 0.408 0.416 +29% <shared_mutex>
0.233 0.244 0.293 0.301 +29% <queue>
..... 0.127 0.158 0.163 +28% <optional>
0.337 0.332 0.424 0.432 +28% <thread>
0.232 0.240 0.295 0.297 +28% <memory>
0.164 0.171 0.208 0.209 +28% <valarray>
..... 0.558 0.683 0.712 +28% <filesystem>
..... 0.372 0.469 0.474 +27% <memory_resource>
0.395 0.393 0.489 0.501 +27% <ios>
..... 0.132 0.163 0.167 +27% <variant>
0.373 0.371 0.465 0.471 +26% <condition_variable>
0.571 0.563 0.692 0.721 +26% <fstream>
0.391 0.386 0.490 0.493 +26% <codecvt>
0.372 0.369 0.465 0.467 +26% <mutex>
0.493 0.490 0.613 0.619 +26% <complex>
0.398 0.394 0.493 0.500 +25% <streambuf>
0.442 0.439 0.548 0.555 +25% <locale>
0.395 0.392 0.492 0.495 +25% <future>
0.477 0.472 0.584 0.594 +25% <istream>
0.470 0.468 0.574 0.584 +24% <ostream>
0.480 0.479 0.587 0.597 +24% <iomanip>
0.482 0.478 0.589 0.599 +24% <sstream>
..... 0.122 0.150 0.152 +24% <charconv>
0.484 0.478 0.589 0.597 +23% <strstream>
0.480 0.472 0.582 0.592 +23% <iostream>
0.031 0.035 0.037 0.038 +23% <type_traits>
0.060 0.066 0.073 0.074 +23% <numeric>
..... 0.125 0.147 0.148 +18% <any>
0.042 0.045 0.048 0.048 +14% <typeinfo>
0.018 0.020 0.020 0.021 +14% <exception>
0.020 0.022 0.022 0.022 +11% <new>
0.039 0.041 0.043 0.043 +10% <cmath>
0.086 0.085 0.095 0.094 +9% <atomic>
0.036 0.037 0.038 0.038 +6% <stdexcept>
..... ..... 0.318 0.331 +4% <ranges>
..... ..... 0.179 0.182 +2% <barrier>
..... ..... 0.043 0.027 -36% <concepts>

clang -std=c++20 w/o and w/ -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES:

w/o w/ diff header
0.050 0.006 -88% <numbers>
0.607 0.077 -87% <numeric>
0.043 0.009 -79% <ratio>
0.046 0.012 -75% <limits>
0.625 0.187 -70% <span>
0.635 0.219 -66% <valarray>
0.078 0.028 -64% <bit>
0.488 0.183 -63% <array>
0.614 0.237 -61% <forward_list>
0.362 0.142 -61% <scoped_allocator>
0.638 0.252 -61% <list>
0.686 0.279 -59% <string_view>
0.049 0.021 -58% <exception>
0.630 0.268 -57% <set>
0.622 0.266 -57% <deque>
0.363 0.156 -57% <any>
0.614 0.264 -57% <stack>
0.630 0.274 -56% <map>
0.051 0.023 -55% <new>
0.631 0.283 -55% <unordered_set>
0.373 0.168 -55% <optional>
0.376 0.188 -50% <barrier>
0.672 0.336 -50% <ranges>
0.780 0.393 -50% <system_error>
0.620 0.313 -50% <queue>
0.761 0.384 -50% <chrono>
0.753 0.387 -49% <string>
0.533 0.282 -47% <unordered_map>
0.527 0.279 -47% <vector>
0.780 0.415 -47% <bitset>
0.791 0.446 -44% <shared_mutex>
0.069 0.040 -42% <stdexcept>
0.076 0.045 -41% <cmath>
0.831 0.498 -40% <thread>
0.169 0.102 -39% <atomic>
0.167 0.104 -38% <latch>
0.171 0.107 -37% <semaphore>
0.466 0.293 -37% <algorithm>
0.806 0.513 -36% <codecvt>
0.779 0.496 -36% <mutex>
0.778 0.497 -36% <condition_variable>
0.816 0.522 -36% <memory_resource>
0.872 0.558 -36% <random>
0.823 0.531 -36% <ios>
0.813 0.525 -35% <future>
0.815 0.526 -35% <streambuf>
0.056 0.037 -35% <expected>
0.152 0.102 -33% <typeindex>
0.866 0.584 -33% <locale>
0.980 0.671 -32% <complex>
0.904 0.619 -32% <ostream>
0.906 0.622 -31% <sstream>
0.903 0.623 -31% <iostream>
0.900 0.627 -30% <istream>
0.917 0.641 -30% <strstream>
0.897 0.632 -30% <iomanip>
1.011 0.741 -27% <fstream>
1.002 0.737 -26% <filesystem>
0.918 0.677 -26% <regex>
0.092 0.069 -25% <compare>
0.610 0.459 -25% <functional>
0.165 0.129 -22% <tuple>
0.143 0.114 -21% <utility>
0.156 0.129 -18% <coroutine>
0.203 0.169 -17% <variant>
0.363 0.315 -13% <memory>
0.186 0.170 -9% <charconv>
0.251 0.246 -2% <iterator>
0.052 0.053 +3% <typeinfo>
0.019 0.021 +8% <cstdio>

At least I learned that I need -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES if I want to develop on libc++. It is seems like a too conservative decision to have opt-in macro instead of opt-out for this.

philnik777 commented 1 year ago

You can always run the tests against libc++ you have run here, right? If splitting up the headers is so bad on windows, it should be obvious by having much worse times than the STL. It's also quite possible that there is a sweet spot between what libc++ is doing and what you are doing here.

It is hard to make a fair comparison. It seems that there is no easy way to test libc++ without building it, and even then it also seems to not work when clang targets msvc abi. Here are results from what is shipped with MSYS2 CLANG64 using its clang version 16.0.4 Target: x86_64-w64-windows-gnu.

libc++ seems to be doing much better than MSSTL <chrono> is only 57% slower :-), though +60% for <utility> I wouldn't call a mild increase.

That's not a mild increase, but it's at least within the same order of magnitude. The C++ sodlib has grown a lot, especially with ranges and format, and it looks to me like splitting the headers up pays off.

At least I learned that I need -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES if I want to develop on libc++. It is seems like a too conservative decision to have opt-in macro instead of opt-out for this.

We decided to remove all the transitive includes in C++23, since that breaks almost nobody right now, and people would simply have to add the respective headers as they migrate to C++23. But it feels to me like the 3min difference in our CI made it quite obvious that we should make the call and remove the transitive includes (even though that will break a lot of users). So yeah, I hope we remove the transitive includes soon.

Kojoley commented 1 year ago

and it looks to me like splitting the headers up pays off.

It looks like it does. Could be a direction for MSSTL, and opt-in/out macro too.

@StephanTLavavej do you think https://github.com/microsoft/STL/pull/3631 could land with opt-in/out macro?

Kojoley commented 1 year ago

@philnik777 though I am a little bit worried that clang might report wrong timing because by default it leaks memory and doesn't close resources, though sync on close might not be done by OS when the file haven't been modified.

frederick-vs-ja commented 1 year ago

libc++ seems to be doing much better than MSSTL <chrono> is only 57% slower :-), though +60% for <utility> I wouldn't call a mild increase.

+60% for <utility> looks reasonable and unavoidable to me, because there're many stuffs added into <utility> in C++17, C++20, and C++23 standard revisions.

frederick-vs-ja commented 1 year ago

@StephanTLavavej do you think #3631 could land with opt-in/out macro?

Given the metioned breakings mainly came from containers (https://github.com/microsoft/STL/pull/3631#issuecomment-1499904029), I guess we can move the inclusion of <cwchar> (alongwith <cstdio>) to <xmemory>, which is already sufficiently heavy.