wrong results for vhsub_s8

intel / ARM_NEON_2_x86_SSE

The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to AVX2 intrinsic functions

Other

430 stars 149 forks source link

wrong results for vhsub_s8 #70

Closed cwiede closed 3 months ago

cwiede commented 3 months ago

Calling vhsub_s8 with these arguments delivers different results than the ARM processor:

vhsub_s8([-128, -128, -128, -128, -128, -128, -128, -127], [-128, -127, -1, 0, 1, 126, 127, -128])

actual result:
[0, -1, -64, -64, 63, 1, 0, 0]

expected result: [0, -1, -64, -64, -65, -127, -128, 0]

Looks like the overflow behaves differently.

cwiede commented 3 months ago

never mind, this seemed to be an issue with the test case

Zvictoria commented 3 months ago

@cwiede I need to double check that. From the code I see that you found a real problem. Thanks for reporting it anyway

cwiede commented 3 months ago

You are right @Zvictoria - I was getting confused with my environment. The problem is real.

BTW: Thanks for this project. It is a huge help for us :)

Zvictoria commented 3 months ago

@cwiede you are welcome. The problem is very easy to fix, will do that next week

Zvictoria commented 3 months ago

@cwiede please check the last version and close the issue if it fix it

cwiede commented 3 months ago

@Zvictoria thanks for working on this.

I still get different results on my arm platform (actually an nvidia TX2) compared to the simulation.

arguments: [-128, -128, -128, -128, -128, -128, -128, -127], [-128, -127, -1, 0, 1, 126, 127, -128]

result on ARM platform: [ 0, -1, -64, -64, -65, -127, -128, 0] result with with new NEONvsSSE.h: [ 0, -1, -64, -64, 63, 1, 0, 0]

Zvictoria commented 3 months ago

@cwiede this time I will ask you to double check your setup :) Could you please? Rebuild all or something like that. What OS/compiler do you use? For me x86 Windows 11, Microsoft Visual Studio with the new version I get exactly [ 0, -1, -64, -64, -65, -127, -128, 0] as expected. But it is highly unlikely that other compilers/OS produce different results in this case.

cwiede commented 3 months ago

@Zvictoria Interesting.

We are using Visual Studio 2015 to compile the project (this is on Windows 10). It seems that the option /Ox causes the result to be [ 0 -1 -64 -64 63 1 0 0] while omitting this option results in [ 0 -1 -64 -64 -65 -127 -128 0] as expected. I am 99.9% sure that this is not a test environment issue.

I have also checked on linux x86_64 with gcc 9.4.0. Both optimized and non-optimized deliver the correct results here. So it seems to be a compiler issue here.

Note: I have removed the unused variable r in the vhsub_s8 implementation to prevent some compiler warnings, this should probably also fixed upstream. The test function looks like this:

void ut_vhsub_s8(int8_t utresult[], int8_t N[], int8_t M[])
{
    { int8x8_t uttmp = vhsub_s8(vld1_s8(N), vld1_s8(M)); vst1_s8(utresult, uttmp); };
}

This is the exact version of the compiler:

Zvictoria commented 3 months ago

I will try /Ox compiler option myself, however if it is the compiler bug, I can't do anything there, sorry :( Do you use SSE4 define in your code as stated in the NEONvsSSE.h file?

cwiede commented 3 months ago

however if it is the compiler bug, I can't do anything there

yeah that's understood :)

Do you use SSE4 define in your code as stated in the NEONvsSSE.h file?

__SSE4_2__ doesn't seem to be defined in the msvc environment, so it is disabled

Zvictoria commented 3 months ago

then could you define it manually? It will give you some extra performance and might solve your problem

cwiede commented 3 months ago

then could you define it manually? It will give you some extra performance and might solve your problem

I've tried now, it doesn't change the result

cwiede commented 3 months ago

I think that the issue can be closed now - thanks for the fix Victoria 👍

Zvictoria commented 3 months ago

You are welcome! BTW I don't have VS2015, but checked it on VS2017 - Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27048 for x64 - it works as expected with /Ox.