microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.07k stars 1.48k forks source link

Auto-vectorize `count_if`, `count` #4653

Closed AlexGuteniev closed 4 months ago

AlexGuteniev commented 4 months ago

This is the rephrasing of #4456, with all progress made so far incorporated.   count and count_if can be auto-vectorized as follows:

For count_if this would be the only feasible way to vectorize, as predicates cannot be used in separately compiled implementation, and we don't want complex manual vectorization with intrinsics in headers for throughput reasons.

For count this can be still an alternative to manual vectorization. The performance of auto-vectorization when compiling with /arch:AVX2 seems to be not much worse than existing manual vectorization for large ranges, albeit significantly worse for small ranges with large tails (auto-vectorization doesn't do the mask thing). So we can: