Open markos opened 8 months ago
To document our conversation from another place:
The int128
implementation was introduced in
https://github.com/simd-everywhere/simde/commit/f132275f85ab1c1cb1e890538ee552c11ca09c38#diff-c34749104c5a84a98b2e0af443fc6df77276e235abd1cff17a74d285c4fddd93R2344
From what I see, it still matches the guidance from Intel for _mm_testz_si128
: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_testz_si128&ig_expand=6857
Compute the bitwise AND of 128 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. [Something about CF, but that is not used in this function] Return the ZF value.
The other pathway has reversed structure to enable fast failing, as it compares 64 bits at a time; so if the bitwise AND of the first 64 bits is not zero, we return 0
immediately
The int128 implementation of simde_mm_testz_si128() is incorrect, caught in vectoscan CI with SIMDe backend:
https://buildbot-ci.vectorcamp.gr/#/builders/119/builds/14
After investigation I found that for x86 the int128 implementation was used and only these tests were failing. The other architectures used the u64 method and they had correct results.
https://github.com/simd-everywhere/simde/blob/5405bbdcc7e9045b0901c847c8868594deb511a6/simde/x86/sse4.1.h#L2344
The following fix seems to work and it makes sense as it seems to be consistent with the u64 implementation below.