Open davidbolvansky opened 4 years ago
cc @fhahn as well
Are you sure it actually makes sense to vectorize this? The vector code appears to rely on a lot of gather load patterns
i686 codegen: https://godbolt.org/z/ceeE9K364 (slightly easier to see whats going on)
I think the cost model is correct to say that the 64-bit gather loads were not worth it without a AVX512 capable target cpu.
What's interesting is that really we don't need gather patterns at all - this is really an interleaved2 load pattern as the std::vector[] data are consecutive begin/end pointers.
Extended Description
ICC -O3 -mavx2 vectorizes this loop. Clang does not vectorize it due to cost model.
https://godbolt.org/z/Peds78