Closed danking closed 7 years ago
Seems you want a horizontal sum. This should be possible in a single instruction w/o any loop (unrolled or not). W.
On 1 Dec 2016, at 21:16, Daniel King notifications@github.com wrote:
Hi!
I'm not very experienced with C++ and especially not with this library, but I've found that some of my core uses of this library require patterns like:
uint64_t count = _mm_popcnt_u64(extract<0>(x));
if UINT64_VECTOR_SIZE >= 2
count += _mm_popcnt_u64(extract<1>(x));
if UINT64_VECTOR_SIZE >= 4
count += _mm_popcnt_u64(extract<2>(x)); count += _mm_popcnt_u64(extract<3>(x));
if UINT64_VECTOR_SIZE >= 8
count += _mm_popcnt_u64(extract<4>(x)); count += _mm_popcnt_u64(extract<5>(x)); count += _mm_popcnt_u64(extract<6>(x)); count += _mm_popcnt_u64(extract<7>(x));
if UINT64_VECTOR_SIZE > 8
error "we do not support vectors longer than 8, please file an issue"
endif
endif
endif
It would be awesome if there was some syntax like:
uint64_t count = 0 x.foreach<64>( = { count += _mm_popcnt_u64(e); }) I'm happy to hack this up, but I'd need some guidance/scaffolding about how to approach the problem in the framework of libsimdpp.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/p12tic/libsimdpp/issues/50, or mute the thread https://github.com/notifications/unsubscribe-auth/AIg2kcd1svb40Xlb3rsKwNM1OLQEd6pTks5rDzkUgaJpZM4LB5iH.
Does there exist an instruction that does a horizontal sum of pop counts? I didn't see that in the intel assembly guide. The actual instruction I'm looking for is 256-bit pop count, but that also doesn't seem to exist.
Facing a similar problem, I wrote some template code which some might find helpful until a permanent solution is implemented. I'm hesitant to create a pull request since I've only been using the library for a day now, so this could likely use some tweaking. Tested MSVC15/64
`
// place in a header somewhere...
#include <array>
// extract all values from simdpp buffer via store_u
template <typename T, int Size = T::length>
inline auto to_array( const T& what ) -> std::array<typename T::element_type, Size>
{
std::array<typename T::element_type, Size> result{};
simdpp::store_u( result.data(), what );
return result;
}//to_array
`
Usage: `
simdpp::float32v4 buf{}; // or whatever simdpp type
// do stuff with buf...
auto my_data = to_array( buf ); // get stl array with all the elements of buf; in this case returns std::array<float,4> because input was float32v4
for ( auto&& my_value : my_data ) // iterate array
// do stuff with my_value...
`
Implemented in 2812474a42e5.
@danking reduce_popcnt()
has been added in 840c7e3706a9e3 which might be useful for your use-case.
Hi!
I'm not very experienced with C++ and especially not with this library, but I've found that some of my core uses of this library require patterns like:
It would be awesome if there was some syntax like:
I'm happy to hack this up, but I'd need some guidance/scaffolding about how to approach the problem in the framework of libsimdpp.