simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.37k stars 247 forks source link

x86 and neon max/min are not the same #963

Open moon-chilled opened 2 years ago

moon-chilled commented 2 years ago

Currently, _mm_max_pd is implemented in terms of vmaxq_f64 on neon. But their behaviour is not quite the same. When handed zeroes of opposite signs, the latter will return positive zero, but the former will return its second argument. The situation with min is analogous.

mr-c commented 1 year ago

Thank you @moon-chilled for your report. Can you send a PR to fix this? Or suggest test cases that would make the problem clear?

moon-chilled commented 1 year ago

_mm_max_pd(0,-0) = -0

vmaxq_f64(0,-0) = 0

mr-c commented 1 year ago

Thank @moon-chilled for the test cases; I don't know NEON well enough to fix this. A PR or suggestion from anyone is very welcome

aqrit commented 1 year ago

solutions from sse2neon for __aarch64__:

_mm_max_pd: r_.neon_f64 = vbslq_f64(vcgtq_f64(a_.neon_f64, b_.neon_f64), a_.neon_f64, b_.neon_f64);

_mm_min_pd: r_.neon_f64 = vbslq_f64(vcltq_f64(a_.neon_f64, b_.neon_f64), a_.neon_f64, b_.neon_f64);

_mm_min_sd: return simde_mm_move_sd(a, simde_mm_min_pd(a, b));

_mm_max_sd: return simde_mm_move_sd(a, simde_mm_max_pd(a, b));

solutions from sse2neon for ARMv7-A:

_mm_max_ps is already correct.

_mm_max_ss: simde_mm_move_ss(a, simde_mm_max_ps(a, b));

_mm_min_ps is missing a case: r_.neon_f32 = vbslq_f32(vcltq_f32(a_.neon_f32, b_.neon_f32), a_.neon_f32, b_.neon_f32);

_mm_min_ss: return simde_mm_move_ss(a, simde_mm_min_ps(a, b));

I wonder about other targets WASM, POWER, etc.

mr-c commented 1 year ago

Thank you @aqrit ; do you have time to send a PR with the test case in https://github.com/simd-everywhere/simde/issues/963#issuecomment-1399357212 along with those fixes?

aqrit commented 1 year ago

yeah, I'll put it together tonight.

aqrit commented 1 year ago

_mm_min_ps & _mm_min_ss seem to have different behavior. So I'll have to put this off till tomorrow night.

aqrit commented 1 year ago

_mm_min_ps & _mm_min_ss seem to have different behavior.

GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99497

mr-c commented 1 year ago

_mm_min_ps & _mm_min_ss seem to have different behavior.

GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99497

Yuck, what a mess. Maybe we should adjust and have two sets of functions: one set always follows the Intel intrinsic spec , and the other set always follows the GCC behavior.