Additional details - Githubissues

travisdowns commented 2 years ago

I could run this on more hardware, but do you have some details about what test to run and what fails specifically?

Have you managed to reproduce it outside of .NET?

gregoral commented 2 years ago

Hi, the code essentially runs a single test in a tight loop. At the moment all tests calculate the same value. The only difference is that there are multiple variants ( "simple", unrolled-loop, multiple independent registers, ... ). Most vector tests fail in under 30 seconds on 4-th gen processors but not on newer generations.

The "core" of the loop contains only three SIMD instructions in a tight loop:

Vector.Read
Vector.Equals
Vector.Subtract

I conditionally execute non SIMD code to align memory address before entering the vector loop. But this seems to have no effect other than performance.

gregoral commented 2 years ago

Update: It appears the issue is related to Win 7 OS. I was able to get 1 error on Win 10 as well, but I'm unable to reproduce.

There is a clear difference when I boot Windows 7 and Win 10 on the same machine.

I get errors under Win 7 and none under Win 10. I'll test under Linux as well.

I will try to write a test case in C or C++.

travisdowns commented 2 years ago

Maybe also compare the .NET platform versions? It seems plausible to me that it is related to e.g., the .NET JIT, perhaps during the transition between different levels of compilation or interpreter and compiler (assuming .NET works like this, I'm more familiar with Java) something goes wrong and a register value gets clobbered.

gregoral commented 2 years ago

Initially I didn't pay attention. But the last tests were all done on .NET 6.0.1 which is about a month old.

I'll test with the last version of .NET as well, but .NET seems unlikely cause. The reason being that the same code run with the same parameters in a loop returns an incorrect result approximately 1 time out of 10.000 calls.

I plan to convert the code to C++ first and test it on Win 7, Win 10 and Linux.

travisdowns commented 2 years ago

Hi Gregor,

I wanted to test your code last night and got as far as turning on my old Haswell box and connecting a monitor but then my kid woke up and that was the end of that.

Maybe I'll be able to try today.

For what it's worth, you mention you were able to see it 1 time on Windows 10 and 1 time on Skylake, but that you don't think it really occurs there because you haven't seen it again, but in my opinion "1 time" is enough if you're sure it happened and the symptom was similar. Perhaps it just happens much less frequently.

Travis

On Wed, Mar 2, 2022 at 10:21 AM Gregor Alujevic @.***> wrote:

Initially I didn't pay attention. But the last tests were all done on .NET 6.0.1 which is about a month old.

I'll test with the last version of .NET as well, but .NET seems unlikely cause. The reason being that the same code run with the same parameters in a loop returns an incorrect result approximately 1 time out of 10.000 calls.

I plan to convert the code to C++ first and test it on Win 7, Win 10 and Linux.

— Reply to this email directly, view it on GitHub https://github.com/gregoral/SIMDeefective/issues/1#issuecomment-1057239213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKZQL2YGQV4KS6T7ILSCLU56WRNANCNFSM5PVM4QYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

gregoral commented 2 years ago

You are correct, I don't think the issue cannot possibly occur on Windows 10. I'm 100% certain that the error occurred on Win10 and i5-6500. But I'm only 90% certain that my code was not at fault back then. I did refine the code somewhat after that and I just have no repeatable way to demonstrate at the moment.

This is why I want to convert to a much simpler test in C or C++.

The current crop of tests was originally written to evaluate performance of different SIMD variations of the same code. The code was not written to hunt for a bug. It just happens to be a good indicator for SIMD problems.

The plan for next week is to get a better test code in C++. As it turns out all that is needed is to test if the function return the same result 10.000 times ( or more of course ). If any of the calls return a different result that means an error has occurred during execution.

gregoral / SIMDeefective

Additional details #1