Open moon-chilled opened 2 years ago
Thank you @moon-chilled for your report. Can you send a PR to fix this? Or suggest test cases that would make the problem clear?
_mm_max_pd(0,-0) = -0
vmaxq_f64(0,-0) = 0
Thank @moon-chilled for the test cases; I don't know NEON well enough to fix this. A PR or suggestion from anyone is very welcome
solutions from sse2neon for __aarch64__
:
_mm_max_pd
:
r_.neon_f64 = vbslq_f64(vcgtq_f64(a_.neon_f64, b_.neon_f64), a_.neon_f64, b_.neon_f64);
_mm_min_pd
:
r_.neon_f64 = vbslq_f64(vcltq_f64(a_.neon_f64, b_.neon_f64), a_.neon_f64, b_.neon_f64);
_mm_min_sd
:
return simde_mm_move_sd(a, simde_mm_min_pd(a, b));
_mm_max_sd:
return simde_mm_move_sd(a, simde_mm_max_pd(a, b));
solutions from sse2neon for ARMv7-A
:
_mm_max_ps
is already correct.
_mm_max_ss
:
simde_mm_move_ss(a, simde_mm_max_ps(a, b));
_mm_min_ps
is missing a case:
r_.neon_f32 = vbslq_f32(vcltq_f32(a_.neon_f32, b_.neon_f32), a_.neon_f32, b_.neon_f32);
_mm_min_ss
:
return simde_mm_move_ss(a, simde_mm_min_ps(a, b));
I wonder about other targets WASM, POWER, etc.
Thank you @aqrit ; do you have time to send a PR with the test case in https://github.com/simd-everywhere/simde/issues/963#issuecomment-1399357212 along with those fixes?
yeah, I'll put it together tonight.
_mm_min_ps
& _mm_min_ss
seem to have different behavior. So I'll have to put this off till tomorrow night.
_mm_min_ps
&_mm_min_ss
seem to have different behavior.
_mm_min_ps
&_mm_min_ss
seem to have different behavior.
Yuck, what a mess. Maybe we should adjust and have two sets of functions: one set always follows the Intel intrinsic spec , and the other set always follows the GCC behavior.
Currently, _mm_max_pd is implemented in terms of vmaxq_f64 on neon. But their behaviour is not quite the same. When handed zeroes of opposite signs, the latter will return positive zero, but the former will return its second argument. The situation with min is analogous.