llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.39k stars 12.15k forks source link

Reduction loop with std::vector not vectorized due to cost model #47206

Open davidbolvansky opened 4 years ago

davidbolvansky commented 4 years ago
Bugzilla Link 47862
Version trunk
OS Linux
CC @RKSimon,@rotateright

Extended Description

#include <vector>

std::size_t stdvecsum(std::vector<std::vector<float>> const & v) {
    std::size_t ret = 0;
    for (std::size_t i = 0; i < v.size(); ++i)
      ret += v[i].size();
    return ret;
}

ICC -O3 -mavx2 vectorizes this loop. Clang does not vectorize it due to cost model.

https://godbolt.org/z/Peds78

davidbolvansky commented 2 years ago

cc @fhahn as well

RKSimon commented 2 years ago

Are you sure it actually makes sense to vectorize this? The vector code appears to rely on a lot of gather load patterns

RKSimon commented 2 years ago

i686 codegen: https://godbolt.org/z/ceeE9K364 (slightly easier to see whats going on)

I think the cost model is correct to say that the 64-bit gather loads were not worth it without a AVX512 capable target cpu.

What's interesting is that really we don't need gather patterns at all - this is really an interleaved2 load pattern as the std::vector[] data are consecutive begin/end pointers.