simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.4k stars 251 forks source link

_mm_rsqrt_ss not matching simde_mm_rsqrt_ss fail #1222

Open YileKu opened 2 months ago

YileKu commented 2 months ago

A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00 B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00

  auto mul_A0 = _mm_mul_ss(A0,A0);
   auto mul_B0 = _mm_mul_ss(B0,B0);
   auto add_ss = _mm_add_ss(mul_A0, mul_B0 );

mul_a0: 00 00 10 41 00 00 00 00 00 00 00 00 00 00 00 00 mul_b0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00 add_ss: 00 00 20 41 00 00 00 00 00 00 00 00 00 00 00 00

   auto root = _mm_rsqrt_ss( add_ss );

root: 00 E0 A1 2E 00 00 00 00 00 00 00 00 00 00 00 00

On a Cortex-A72 using simde_mm_rsqrt_ss:

A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00 B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00 add_ss: 00 00 20 41 00 00 00 00 00 00 00 00 00 00 00 00

root: 00 80 A1 3E 00 00 00 00 00 00 00 00 00 00 00 00

YileKu commented 2 months ago

This code gives different results when run on intel and with simd-everywhere headers on cortex-a72

void ldump_debug (char t, void _d, int len) { fprintf(stdout,"%s: ",t); unsigned char cp = (unsigned char )_d; for (int i= 0; i<len; i++, cp++) fprintf(stdout,"%02X ", *cp ); fprintf(stdout,"\n"); }

__m128 t = { 0x00002041, 00, 00, 00 } ; auto out = _mm_rsqrt_ss(t); ldump_debug("LOCAL", &out, sizeof(out));

On Cortex-a72: LOCAL: 00 00 34 3C 00 00 00 00 00 00 00 00 00 00 00 00 On Intel : LOCAL: 00 48 34 3C 00 00 00 00 00.....

mr-c commented 2 months ago

Hello @YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

YileKu commented 2 months ago

I will try that thank you.

On Tue, Sep 17, 2024 at 6:48 AM Michael R. Crusoe @.***> wrote:

Hello @YileKu https://github.com/YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

— Reply to this email directly, view it on GitHub https://github.com/simd-everywhere/simde/issues/1222#issuecomment-2355662340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWCEAKYNC7FJ7BZZ4X3OTZXAQIRAVCNFSM6AAAAABOIHS22WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGY3DEMZUGA . You are receiving this because you were mentioned.Message ID: @.***>

YileKu commented 2 months ago

So isn’t the precision implicit in the API? Are there other AVX apis that need a clarification when being mapped to NEON?

On Wed, Sep 18, 2024 at 10:01 AM Yile Ku @.***> wrote:

I will try that thank you.

On Tue, Sep 17, 2024 at 6:48 AM Michael R. Crusoe < @.***> wrote:

Hello @YileKu https://github.com/YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

— Reply to this email directly, view it on GitHub https://github.com/simd-everywhere/simde/issues/1222#issuecomment-2355662340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWCEAKYNC7FJ7BZZ4X3OTZXAQIRAVCNFSM6AAAAABOIHS22WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGY3DEMZUGA . You are receiving this because you were mentioned.Message ID: @.***>

mr-c commented 2 months ago

So isn’t the precision implicit in the API? Are there other AVX apis that need a clarification when being mapped to NEON?

That's a good question. I didn't write this code. I think https://github.com/simd-everywhere/simde?tab=readme-ov-file#caveats should be updated with this information

YileKu commented 1 month ago

Tried with the #define above and it still didn't work.

nemequ commented 1 month ago

The rsqrt instructions are interesting. They're not actually specified to require bit-accurate implementations, but are instead specified as being mathematically accurate to a given precision. See the Intel API docs:

The maximum relative error for this approximation is less than 1.5*2^-12.

The instructions aren't even bit-compatible across CPU manufacturers… Intel and AMD return different values.

I'm not saying the implementation is perfect, only that bit-accurate results are not expected. It's possible some implementations have a higher maximum relative error than specified, but they should be pretty comparable, at least with a higher accuracy preference selected.

YileKu commented 1 month ago

Thanks for the explanation.

On Thu, Sep 26, 2024 at 8:05 AM Evan Nemerson @.***> wrote:

The rsqrt instructions are interesting. They're not actually specified to require bit-accurate implementations, but are instead specified as being mathematically accurate to a given precision. See the Intel API docs https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_rsqrt_ss&ig_expand=5647 :

The maximum relative error for this approximation is less than 1.5*2^-12.

The instructions aren't even bit-compatible across CPU manufacturers… Intel and AMD return different values https://robert.ocallahan.org/2021/09/rr-trace-portability-diverging-behavior.html .

I'm not saying the implementation is perfect, only that bit-accurate results are not expected. It's possible some implementations have a higher maximum relative error than specified, but they should be pretty comparable, at least with a higher accuracy preference selected.

— Reply to this email directly, view it on GitHub https://github.com/simd-everywhere/simde/issues/1222#issuecomment-2377075560, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWCECZTWLKEGZMGKH2JJDZYQIDBAVCNFSM6AAAAABOIHS22WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGA3TKNJWGA . You are receiving this because you were mentioned.Message ID: @.***>