greg7mdp / parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers.
https://greg7mdp.github.io/parallel-hashmap/
Apache License 2.0
2.53k stars 239 forks source link

Very minor optimization: _mm_abs_epi8 instead of _mm_sign_epi8 #213

Closed Myriachan closed 11 months ago

Myriachan commented 11 months ago

It'd be a very minor optimization, if it does anything measurable at all, but:

static_cast<uint32_t>(_mm_movemask_epi8(_mm_sign_epi8(ctrl, ctrl))));

Could be this instead:

static_cast<uint32_t>(_mm_movemask_epi8(_mm_abs_epi8(ctrl))));

pabsb is also SSSE3, so requirements don't change. The advantage here is that pabsb is non-destructive, which could produce slightly better code. You'd have to try it, though...

greg7mdp commented 11 months ago

Thanks for the suggestion, I really appreciate it.

Since I am not an expert in SSE2, I'm hesitant to make a change, especially since you mention it would be a very minor (if any) improvement, and I'd rather err on the safe side.

So I'll close the issue, but feel free to let me know if you can measure any significant improvement with this (or any other change).