gsl::span performance is a bottleneck for adoption

microsoft / GSL

Guidelines Support Library

Other

6.21k stars 740 forks source link

gsl::span performance is a bottleneck for adoption #1165

Open tiagomacarios opened 1 week ago

tiagomacarios commented 1 week ago

Office is trying to adopt gsl::span more broadly, but we are seeing some unexpected performance regressions when compiling with clang: A - range-for loops B - std algorithms

On a mail thread @StephanTLavavej suggested:

I'd recommend reaching out to the libc++ devs on either the LLVM Discord server or their Discourse forum, about whether they have an "unwrapping" system similar to what MSVC's STL has. (Presumably you'd want to do the same for libstdc++.)

We use libc++'s portable subset of tests, and we're aligned on a few implementation things (backporting import std; to C++20, names of compiler builtins, etc.), but our internal mechanisms are wildly different - there is no commonality between our unwrapping mechanisms.

In C++20 we can detect arbitrary contiguous iterators, which lets us extract raw pointers, but in C++14/17 that's not physically possible so iterators really need to opt in to unwrapping.

Could one of the maintainers please follow up?

carsonRadtke commented 1 week ago

@tiagomacarios Thank you for reporting this issue.

After a brief look into libcxx's std::span, it looks like they call "special" iterators when Safe Buffers are enabled - these seem to make std::span analogous to gsl::span, but the bounds checking gets optimized away with -O2. I will start a discussion with clang folks to determine their thoughts on the matter and see if there is a protocol that gsl::span can use.

In the meantime, could you share what version of clang you are using?

tiagomacarios commented 1 week ago

We use 3 different version of clang:

general tooling - the latest release (19.1.0)
android ndk r27 LTS - 3c92011b600bdf70424e2547594dd461fe411a41 which seems to be 17.x plus some arbitrary commits
apple xcode 16.0-,16.0,-1021.4) which maps to 17.0.6

We usually update compilers as soon as new LTS versions are available.

tiagomacarios commented 1 week ago

From the mail thread I see that this has been fixed. Would you be against having performance tests? I wonder if we could use @galenelias tests and have the perf tests fail if they are not within a threshold from each other.

carsonRadtke commented 6 days ago

From the mail thread I see that this has been fixed. Would you be against having performance tests? I wonder if we could use @galenelias tests and have the perf tests fail if they are not within a threshold from each other.

Performance tests would be great. I'd be happy to either review a PR with Galen's benchmark or I can create the PR if that is more convenient. I'd rather not add a new dimension to the test matrix because we are already pushing 100 runners per PR commit. If we could add one test that compares to the latest standard library implementation for each Clang, GCC, and MSVC that'd be my preference.