Question: why not support SSE2, AVX, and AVX512 all in the same time? (or am I missing something?)

yb303 commented 5 years ago

Even with AVX512 support users may still want to use int128. I think in the current state older code moving to a newer box would force one to widen their ints

kimwalisch commented 5 years ago

This would require significant code changes especially for C e.g. current we have:

__m128i libdivide_s32_do_vector(__m128i, const struct libdivide_s32_t *)
__m128i libdivide_u32_do_vector(__m128i, const struct libdivide_u32_t *)
__m128i libdivide_s64_do_vector(__m128i, const struct libdivide_s64_t *)
__m128i libdivide_u64_do_vector(__m128i, const struct libdivide_u64_t *)

If we wanted to support SSE2, AVX2 and AVX512 at the same time we would need to change the function names:

__m128i libdivide_s32_do_vector_sse2(__m128i, const struct libdivide_s32_t *)
__m128i libdivide_u32_do_vector_sse2(__m128i, const struct libdivide_u32_t *)
__m128i libdivide_s64_do_vector_sse2(__m128i, const struct libdivide_s64_t *)
__m128i libdivide_u64_do_vector_sse2(__m128i, const struct libdivide_u64_t *)

__m256i libdivide_s32_do_vector_avx2(__m256i, const struct libdivide_s32_t *)
__m256i libdivide_u32_do_vector_avx2(__m256i, const struct libdivide_u32_t *)
__m256i libdivide_s64_do_vector_avx2(__m256i, const struct libdivide_s64_t *)
__m256i libdivide_u64_do_vector_avx2(__m256i, const struct libdivide_u64_t *)

__m512i libdivide_s32_do_vector_avx512(__m512i, const struct libdivide_s32_t *)
__m512i libdivide_u32_do_vector_avx512(__m512i, const struct libdivide_u32_t *)
__m512i libdivide_s64_do_vector_avx512(__m512i, const struct libdivide_s64_t *)
__m512i libdivide_u64_do_vector_avx512(__m512i, const struct libdivide_u64_t *)

I consider this solution less elegant. The old SSE2 code was unmaintained for many years and nobody used it as far as I know. I have ported the SSE2 code to AVX2 and AVX512 just a few days ago. Personally I would like to wait and get more feedback from users on how they use the new vector code. If many users request this feature I will consider implementing it.

ridiculousfish commented 3 years ago

I decided to fix this. Now vector functions are tagged with the width:

        libdivide_s32_do_vec128
        libdivide_s64_do_vec128
        libdivide_u32_do_vec128
        libdivide_u64_do_vec128
        libdivide_s32_do_vec256
        libdivide_s64_do_vec256
        libdivide_u32_do_vec256
        libdivide_u64_do_vec256
        libdivide_s32_do_vec512
        libdivide_s64_do_vec512
        libdivide_u32_do_vec512
        libdivide_u64_do_vec512

        libdivide_s32_branchfree_do_vec128
        libdivide_s64_branchfree_do_vec128
        libdivide_u32_branchfree_do_vec128
        libdivide_u64_branchfree_do_vec128
        libdivide_s32_branchfree_do_vec256
        libdivide_s64_branchfree_do_vec256
        libdivide_u32_branchfree_do_vec256
        libdivide_u64_branchfree_do_vec256
        libdivide_s32_branchfree_do_vec512
        libdivide_s64_branchfree_do_vec512
        libdivide_u32_branchfree_do_vec512
        libdivide_u64_branchfree_do_vec512

ridiculousfish / libdivide

Question: why not support SSE2, AVX, and AVX512 all in the same time? (or am I missing something?) #52