ThomasRetornaz commented 6 years ago

Hi I currently migrate from boost:simd to libsimdpp I heavily use transform and reduce algorithm from plain pointers and simd aware operators I will try to implement such algorithm using libsimdpp Are you interested if i providing such "high level" algorithm to the main library?

Possible signature for transform could be template<typename T, typename U, typename UnOp> U* transform(T const* first, T const* last, U* out, UnOp f) { ..... } where UnOp should be designed by users to handle both litterals and "simd vectors"

Transform functions must handle

prelude (if they are not musch element to fit in simd register)
main simd part (element wich fit in simd register and use simd load/store)
epilogue (remaining element which not fit in simd register)

I will add

few traits to pick up the most reliable number of elements for simd part
isaligned function to switch beetween load load_u and store and store_u (which seem missing)
transform and reduce algorithm (for a begining)

Do you have any concern where i should put those different functions I will make a pull request if you are interested

p12tic commented 6 years ago

Hi, thanks for interest. It would be great to have this functionality in libsimdpp.

Do you have any concern where i should put those different functions

I think it doesn't matter as it's not hard to move code and libsimdpp currently does not expose the location of individual headers. At the beginning we could put generic algorithms to simdpp/algirothm folder and see later if there's better place.

@xugng FYI. Do you already work on something like this by chance?

p12tic commented 6 years ago

@xugng: Ping :-)

xugng commented 6 years ago

@p12tic, @ThomasRetornaz : No, I not working on this. Feel free to hack.

ThomasRetornaz commented 6 years ago

HI i will make a pull request on std like transform algorithm but i have a few concern

I can't generate the documentation and check my additions because http://doc.radix.lt/libsimdpp/ is unreachable
i have to add a define call SIMDPP_IDEAL_MAX_ALIGN_BYTES (a la EIGEN) to dispatch on best default alignement depending on literal type. Does it seems to make sens to you (see below)? or other define preexist and/or these information is provided somewhere ?.

if SIMDPP_USE_NULL

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 1

elif SIMDPP_USE_AVX512F

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 64

elif SIMDPP_USE_AVX

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 32

else

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 16

endif

/// TypeTraits int8_t template<> struct TypeTraits { static const size_t SIMDPP_FAST_SIZE = SIMDPP_FAST_INT8_SIZE; using simd_type = int8; static const size_t alignement = SIMDPP_IDEAL_MAX_ALIGN_BYTES; };

Regards TR

p12tic commented 6 years ago

I can't generate the documentation and check my additions because http://doc.radix.lt/libsimdpp/ is unreachable

I disabled public access to it due to hacking concerns. Could you email me at povilas@radix.lt and I'll send you instructions to access it and credentials needed for that.

i have to add a define call SIMDPP_IDEAL_MAX_ALIGN_BYTES <...>

The ideal alignment should differ per type - e.g. on AVX integer types only need to be 128-bit aligned whereas float types need to be 256-bit aligned. The alignment could be specified directly in the TypeTraits specializations, e.g. static const size_t alignment = 1 * fast_size.

Also a couple of naming nitpicks: TypeTraits => type_traits, SIMDPP_FAST_SIZE => fast_size.

Does that make sense to you?

Thanks!

ThomasRetornaz commented 6 years ago

disabled public access to it due to hacking concerns. Could you email me at povilas@radix.lt and I'll send you instructions to access it and credentials needed for that.

Thanks i will send an email

The ideal alignment should differ per type - e.g. on AVX integer types only need to be 128-bit aligned whereas float types need to be 256-bit aligned. The alignment could be specified directly in the TypeTraits specializations, e.g. static const size_t alignment = 1 * fast_size.

Ok i miss this. I'm new on avx/avx2 instructions sets sorry ... Nevertheless alignement can't be equal to 1 * fast_size. If i understand it should be 32 bytes for float types and 16 bytes for interger types on AVX or fast_size==4 for double and ==8 for float which it make sens if fast_size code the "best possible size" for simd pack I don't found a macro and/or mathematical operation which could link fast_size and "alignement" in AVX case Do i need to make a "dispatch" regarding arch in typetraits to handle this? May i miss something stupid Regards TR

ThomasRetornaz commented 6 years ago

The ideal alignment should differ per type - e.g. on AVX integer types only need to be 128-bit aligned whereas float types need to be 256-bit aligned. The alignment could be specified directly in the TypeTraits specializations, e.g. static const size_t alignment = 1 * fast_size.

Hi i converge to this

`

  /// Define typetraits  
    template<class valuetype>
    struct typetraits
    {
        static const size_t alignment = std::alignment_of<valuetype>::value; 
    };

    /// typetraits int8_t
    template<>
    struct typetraits <int8_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT8_SIZE;
        using simd_type = int8<fast_size>;
        static const size_t alignment = fast_size;
    };
    /// typetraits uint8_t
    template<>
    struct typetraits <uint8_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT8_SIZE;
        using simd_type = uint8<fast_size>;
        static const size_t alignment = fast_size;
    };

    /// typetraits int16_t
    template<>
    struct typetraits <int16_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT16_SIZE;
        using simd_type = int16<fast_size>;
        static const size_t alignment = fast_size * 2;
    };
    /// typetraits uint16_t
    template<>
    struct typetraits <uint16_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT16_SIZE;
        using simd_type = uint16<fast_size>;
        static const size_t alignment = fast_size * 2;
    };

    /// typetraits int32_t
    template<>
    struct typetraits <int32_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT32_SIZE;
        using simd_type = int32<fast_size>;
        static const size_t alignment = fast_size * 4;
    };
    /// typetraits uint32_t
    template<>
    struct typetraits <uint32_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT32_SIZE;
        using simd_type = uint32<fast_size>;
        static const size_t alignment = fast_size * 4;
    };

    /// typetraits int64_t
    template<>
    struct typetraits <int64_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT64_SIZE;
        using simd_type = int64<fast_size>;
        static const size_t alignment = fast_size * 8;
    };

    /// typetraits uint64_t
    template<>
    struct typetraits <uint64_t>
    {
        static const size_t fast_size = SIMDPP_FAST_INT64_SIZE;
        using simd_type = uint64<fast_size>;
        static const size_t alignment = fast_size * 8;
    };

    /// typetraits float32
    template<>
    struct typetraits <float>
    {
        static const size_t fast_size = SIMDPP_FAST_FLOAT32_SIZE;
        using simd_type = float32<fast_size>;
        static const size_t alignment = fast_size * 4;
    };

    /// typetraits float64
    template<>
    struct typetraits <double>
    {
        static const size_t fast_size = SIMDPP_FAST_FLOAT64_SIZE;
        using simd_type = float64<fast_size>;
        static const size_t alignment = fast_size * 8;
    };`

It seems to do the job

I disabled public access to it due to hacking concerns. Could you email me at povilas@radix.lt and I'll send you instructions to access it and credentials needed for that.

If you have a time i will check my documentation and make a pull request for std like transform and reduce

By the way do you think over stl like algorithm could be usefull for the library? If i have time i will work on it Regards TR

p12tic / libsimdpp

Populate "high level" stl like algorithm #107

if SIMDPP_USE_NULL

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 1

elif SIMDPP_USE_AVX512F

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 64

elif SIMDPP_USE_AVX

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 32

else

define SIMDPP_IDEAL_MAX_ALIGN_BYTES 16

endif