Closed cwiede closed 3 months ago
never mind, this seemed to be an issue with the test case
@cwiede I need to double check that. From the code I see that you found a real problem. Thanks for reporting it anyway
You are right @Zvictoria - I was getting confused with my environment. The problem is real.
BTW: Thanks for this project. It is a huge help for us :)
@cwiede you are welcome. The problem is very easy to fix, will do that next week
@cwiede please check the last version and close the issue if it fix it
@Zvictoria thanks for working on this.
I still get different results on my arm platform (actually an nvidia TX2) compared to the simulation.
arguments: [-128, -128, -128, -128, -128, -128, -128, -127], [-128, -127, -1, 0, 1, 126, 127, -128]
result on ARM platform: [ 0, -1, -64, -64, -65, -127, -128, 0] result with with new NEONvsSSE.h: [ 0, -1, -64, -64, 63, 1, 0, 0]
@cwiede this time I will ask you to double check your setup :) Could you please? Rebuild all or something like that. What OS/compiler do you use? For me x86 Windows 11, Microsoft Visual Studio with the new version I get exactly [ 0, -1, -64, -64, -65, -127, -128, 0] as expected. But it is highly unlikely that other compilers/OS produce different results in this case.
@Zvictoria Interesting.
We are using Visual Studio 2015 to compile the project (this is on Windows 10). It seems that the option /Ox causes the result to be [ 0 -1 -64 -64 63 1 0 0] while omitting this option results in [ 0 -1 -64 -64 -65 -127 -128 0] as expected. I am 99.9% sure that this is not a test environment issue.
I have also checked on linux x86_64 with gcc 9.4.0. Both optimized and non-optimized deliver the correct results here. So it seems to be a compiler issue here.
Note: I have removed the unused variable r in the vhsub_s8 implementation to prevent some compiler warnings, this should probably also fixed upstream. The test function looks like this:
void ut_vhsub_s8(int8_t utresult[], int8_t N[], int8_t M[])
{
{ int8x8_t uttmp = vhsub_s8(vld1_s8(N), vld1_s8(M)); vst1_s8(utresult, uttmp); };
}
This is the exact version of the compiler:
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24210 for x64 Copyright (C) Microsoft Corporation. All rights reserved.
I will try /Ox compiler option myself, however if it is the compiler bug, I can't do anything there, sorry :( Do you use SSE4 define in your code as stated in the NEONvsSSE.h file?
however if it is the compiler bug, I can't do anything there
yeah that's understood :)
Do you use SSE4 define in your code as stated in the NEONvsSSE.h file?
__SSE4_2__
doesn't seem to be defined in the msvc environment, so it is disabled
then could you define it manually? It will give you some extra performance and might solve your problem
then could you define it manually? It will give you some extra performance and might solve your problem
I've tried now, it doesn't change the result
I think that the issue can be closed now - thanks for the fix Victoria 👍
You are welcome! BTW I don't have VS2015, but checked it on VS2017 - Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27048 for x64 - it works as expected with /Ox.
Calling vhsub_s8 with these arguments delivers different results than the ARM processor:
vhsub_s8([-128, -128, -128, -128, -128, -128, -128, -127], [-128, -127, -1, 0, 1, 126, 127, -128])
actual result:
[0, -1, -64, -64, 63, 1, 0, 0]
expected result: [0, -1, -64, -64, -65, -127, -128, 0]
Looks like the overflow behaves differently.