p12tic / libsimdpp

Portable header-only C++ low level SIMD library
Boost Software License 1.0
1.24k stars 129 forks source link

Feature Request: foreach on vectors #50

Closed danking closed 7 years ago

danking commented 8 years ago

Hi!

I'm not very experienced with C++ and especially not with this library, but I've found that some of my core uses of this library require patterns like:

uint64_t count = _mm_popcnt_u64(extract<0>(x));
#if UINT64_VECTOR_SIZE >= 2
count += _mm_popcnt_u64(extract<1>(x));
#if UINT64_VECTOR_SIZE >= 4
count += _mm_popcnt_u64(extract<2>(x));
count += _mm_popcnt_u64(extract<3>(x));
#if UINT64_VECTOR_SIZE >= 8
count += _mm_popcnt_u64(extract<4>(x));
count += _mm_popcnt_u64(extract<5>(x));
count += _mm_popcnt_u64(extract<6>(x));
count += _mm_popcnt_u64(extract<7>(x));
#if UINT64_VECTOR_SIZE > 8
#error "we do not support vectors longer than 8, please file an issue"
#endif
#endif
#endif

It would be awesome if there was some syntax like:

uint64_t count = 0
x.foreach<64>( [=](e) {
  count += _mm_popcnt_u64(e);
})

I'm happy to hack this up, but I'd need some guidance/scaffolding about how to approach the problem in the framework of libsimdpp.

waltr commented 8 years ago

Seems you want a horizontal sum. This should be possible in a single instruction w/o any loop (unrolled or not). W.

On 1 Dec 2016, at 21:16, Daniel King notifications@github.com wrote:

Hi!

I'm not very experienced with C++ and especially not with this library, but I've found that some of my core uses of this library require patterns like:

uint64_t count = _mm_popcnt_u64(extract<0>(x));

if UINT64_VECTOR_SIZE >= 2

count += _mm_popcnt_u64(extract<1>(x));

if UINT64_VECTOR_SIZE >= 4

count += _mm_popcnt_u64(extract<2>(x)); count += _mm_popcnt_u64(extract<3>(x));

if UINT64_VECTOR_SIZE >= 8

count += _mm_popcnt_u64(extract<4>(x)); count += _mm_popcnt_u64(extract<5>(x)); count += _mm_popcnt_u64(extract<6>(x)); count += _mm_popcnt_u64(extract<7>(x));

if UINT64_VECTOR_SIZE > 8

error "we do not support vectors longer than 8, please file an issue"

endif

endif

endif

It would be awesome if there was some syntax like:

uint64_t count = 0 x.foreach<64>( = { count += _mm_popcnt_u64(e); }) I'm happy to hack this up, but I'd need some guidance/scaffolding about how to approach the problem in the framework of libsimdpp.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/p12tic/libsimdpp/issues/50, or mute the thread https://github.com/notifications/unsubscribe-auth/AIg2kcd1svb40Xlb3rsKwNM1OLQEd6pTks5rDzkUgaJpZM4LB5iH.

danking commented 8 years ago

Does there exist an instruction that does a horizontal sum of pop counts? I didn't see that in the intel assembly guide. The actual instruction I'm looking for is 256-bit pop count, but that also doesn't seem to exist.

clunietp commented 7 years ago

Facing a similar problem, I wrote some template code which some might find helpful until a permanent solution is implemented. I'm hesitant to create a pull request since I've only been using the library for a day now, so this could likely use some tweaking. Tested MSVC15/64

`

        // place in a header somewhere...
        #include <array>

        // extract all values from simdpp buffer via store_u
    template <typename T, int Size = T::length>
    inline auto to_array( const T& what ) -> std::array<typename T::element_type, Size>
    {
        std::array<typename T::element_type, Size> result{};
        simdpp::store_u( result.data(), what );
        return result;
    }//to_array

`

Usage: `

simdpp::float32v4 buf{};    // or whatever simdpp type

// do stuff with buf...

auto my_data = to_array( buf ); // get stl array with all the elements of buf; in this case returns std::array<float,4> because input was float32v4

for ( auto&& my_value : my_data )   // iterate array
    // do stuff with my_value...

`

p12tic commented 7 years ago

Implemented in 2812474a42e5.

@danking reduce_popcnt() has been added in 840c7e3706a9e3 which might be useful for your use-case.