Closed wermos closed 2 years ago
I think we should run benchmarks and see how / whether this hits performancs before making the implementation more complex
One other thing that can be done if you do not wish to move away from switches is to add an 'unreachable' marker to the default case on compilers that support such a construct, e.g MSVC's __assume(0)
or gcc/clang's __builtin_unreachable()
.
should be changed to using some std::enable_if type thing
@wermos in this particular situation I suspect they'd be better off using some internal templated helper type and specializing it based on the register type, since they're already using std::enable_if
to select for integral inputs and those conditionals would necessarily have to end up much more complex to filter the overload set. Something like:
namespace detail
{
template <typename, std::size_t>
struct simd_add
{
static_assert(false, "unsupported arch/op combination");
}
template <>
struct simd_add<__m128i, 1>
{
static constexpr auto& func = ::_mm_add_epi8;
}
template <>
struct simd_add<__m128i, 2>
{
static constexpr auto& func = ::_mm_add_epi16;
}
template <>
struct simd_add<__m128i, 4>
{
static constexpr auto& func = ::_mm_add_epi32;
}
template <>
struct simd_add<__m128i, 8>
{
static constexpr auto& func = ::_mm_add_epi64;
}
}
template <class A, class T, class = typename std::enable_if<std::is_integral<T>::value, void>::type>
inline batch<T, A> add(batch<T, A> const& self, batch<T, A> const& other, requires_arch<sse2>) noexcept
{
return detail::simd_add<batch<T, A>::register_type, T>::func(self, other);
}
We already have a similar system for neon intrinsic, and from a maintenance perspective, a single switch is much easier to maintain / read... I'm personally not in favor of trading ease of maintenance with debug build speed.
One thing we could do, which I think would be very nice, is use if constexpr
if __cplusplus
>= 201703L
, and otherwise, stick to the switch-case statement.
if constexpr
would solve this problem at almost zero cost to ease-of-maintenance.
I don't mind that, provided we can write a nice compat macro that performs either a switch or a cascade of constexpr if.
@serge-sans-paille what are your thoughts on replacing the existing switch statements with an if-clause? Granted, the readability of the code will be hurt a little, but doing so would make writing a macro for switching between if
and if constexpr
very easy.
Moreover, from what I could tell, it doesn't look like there is really a big difference performance-wise between switch
and a cascade of if
s for such a small number of cases.
is use if constexpr if __cplusplus >= 201703L
@wermos Testing the whole C++ version isn't necessary; compilers should define __cpp_if_constexpr
with a value >= 201606 when if constexpr
is supported.
is use
if constexpr
if__cplusplus
>= 201703L@wermos Testing the whole C++ version isn't necessary; compilers should define
__cpp_if_constexpr
with a value>= 201606
whenif constexpr
is supported.
The problem with these feature test macros is that they were added in C++20, so it is not neccesarily defined if, for example, a compiler is fully C++17 compliant (meaning they have if constexpr
) but have no C++20 support yet (meaning these feature test macros don't exist).
The problem with these feature test macros is that they were added in C++20
Are you certain? I've been using the compiler ones for a long time. You're conflating a few separate things, I believe.
__cpp_
) ~have always been around~ predate C++14 at least.__cpp_lib_
) have been around for ages, too. You had to know which header they were defined in, though.<version>
was added for the standard-library ones so you could just include that rather than play dumb/non-portable games with __has_include
and friends.More of both were ratified in C++20, of course, but I don't believe it's somehow retroactive. They've always been there.
GCC7, Clang 6, Visual studio 2017: https://godbolt.org/z/e8E59eK6h
The problem with these feature test macros is that they were added in C++20
Are you certain? I've been using the compiler ones for a long time. You're conflating a few separate things, I believe.
Oh, this is news to me. I always thought that feature test macros were a feature which was added in C++20 but wasn't there before that.
Do you have any links for the documentation of the feature test macros that GCC and Clang support? I looked around a bit but all I could find was GCC's documentation of the feature test macros in their C Standard Library.
Do you have any links [...]
I don't; the cppreference.com page has always been my go-to. It has been there since 2015, though, so I suspect what else may have actually changed in C++20 is "oh hey these are all actually official now, whereas they've been pseudo-standard up until this point", a bit like how #pragma once
is 'non-standard' but supported everywhere that matters.
The feature test macros page in 2015: https://en.cppreference.com/mwiki/index.php?title=cpp/feature_test&oldid=77408
I like the idea of cascading if. Has the test is trivially constant, compilers should have no difficulties folding it so the runtime cost at -O2
should be null.
As I pointed out in one of my messages in #775, there are many cases like this, where a switch-case is employed to make the decision of which intrinsic to call, even though this decision could be done at compile-time.
As part of our program to improve debug build performance, as @marzer suggested, and for improving performance overall, this should be changed to using some
std::enable_if
type thing.