Open davidben opened 9 months ago
We looked at the disasm for std::ranges::copy
today and saw the same thing there. While a call to memcpy(x, y, sizeof(int))
optimizes nicely, the same call to std::ranges::copy()
(where the source has a compile-time-known size) is left as a function call and not inlined.
Edit: This happens with -Oz
if your iterators are complex enough (such as the checked contiguous iterators in //base in Chromium). We observe that std::copy()
with iterators also optimizes poorly, but with pointers it does well in -Oz
.
std::span
doesn't provide span-based methods formemcpy
,memset
, etc. I gather this is because we already have more general iterator- or range-based versions of these operations, so there's no need to duplicate the API surface whenstd::fill
, etc., can just be specialized on contiguous iterators.This works out nicely with
-O2
, where Clang and libc++ are able to dissolve the abstractions effectively. However, with-Oz
, this doesn't happen and we get larger code than with-O2
! https://godbolt.org/z/r1snebavxI assume this is because
-Oz
's inlining heuristics are stop too early and it don't break down the abstractions that it's meant to break. Not sure if this should be fixed in libc++ or-Oz
but, one way or another, this combination isn't great. Moving codebases to something like spans and ranges is a nice safety win, particularly combined with libc++ hardening mode, but the standard way to express this doesn't seem to work well with this combination.CC @danakj