microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.19k stars 1.5k forks source link

Should `ranges::minmax` be auto-vectorized instead of manual vectorization. #4453

Closed AlexGuteniev closed 3 weeks ago

AlexGuteniev commented 8 months ago

I now see that manually-vectorizing ranges::minmax in #4384 was not necessarily a good idea. Original implementation was implemented in terms of minmax_element and was not vectorized. But if it was implemented directly, it would have been vectorized. Consider the following:

unsigned int* max_element(unsigned int*b, unsigned int* e){
    unsigned int m = *b;
    unsigned int* r = b;
    for (++b; b != e; ++b){
        if (m < *b) {
            m = *b;
            r = b;
        }
    }
    return b;
}

unsigned int max1(unsigned int*b, unsigned int* e){
    return *max_element(b, e);
}

unsigned int max2(unsigned int *b, unsigned int* e){
    unsigned int m = *b;
    for (++b; b != e; ++b){
        if (m < *b) {
            m = *b;
        }
    }
    return m;
}

max1 is not vectorized, but max2 is. see https://godbolt.org/z/fG6Yzfz4v

So we could just reimplement ranges::max_element in headers and enjoy vectorization with more context.

Question how should we proceed:

StephanTLavavej commented 8 months ago

We talked about this at the weekly maintainer meeting and we're in favor of implementing the ranges::minmax value-based family with dedicated code in the core header <__msvc_minmax.hpp> (currently only providing types), having <algorithm> use that, and also having vector_algorithms.cpp use that to implement _Minmax (because the import lib can never shrink until vNext).

We should check the benchmark to make sure that this doesn't regress performance significantly.

AlexGuteniev commented 4 months ago

From #4734 there's a conclusion that floating point types without /fp:fast option should be vectorized with _element approach. This has to be kept.

The compiler doesn't generate the _element apporach, doesn't try to vectorize floating min/max at all, and currently even fails to apply the scalar approach properly, see DevCom-10686775.

StephanTLavavej commented 3 weeks ago

As mentioned in #5010, it looks like keeping manual vectorization is best for now.