jdelauney / SIMD-VectorMath-UnitTest

For testing asm SIMD (SSE/SSE 2/SSE 3/SSE 4.x / AVX /AVX 2) vector math library (2f, 4f, matrix, quaternion...) with Lazarus and FreePascal Compiler
Mozilla Public License 2.0
8 stars 0 forks source link

4f AngleBetween fails in Pascal for obtuse angles. #22

Closed dicepd closed 6 years ago

dicepd commented 6 years ago

Compared to AngleCosine which seems to work fine.

jdelauney commented 6 years ago

I've 7 functionals tests breaked : Normalize, NotEquals, GTE, GT, AngleBetween, Combine2 and Combine3

dicepd commented 6 years ago

Yeah Just getting to some of them, done <>

Though Combines are working in unix64 Normalize, GTE GT and anglebetween are what I have broken.

dicepd commented 6 years ago

got the GTE and GT bug fixed

jdelauney commented 6 years ago

The not equal is a hell

In test sub2 return false also if "vt2.Create(120,60,180,240);" sub1 passed and give correct result :

"OpNotEquals:Sub2 does not match ((X: 120,00000 ,Y: 60,00000 ,Z: 180,00000 ,W: 240,00000)<>(X: 120,00000 ,Y: 60,00000 ,Z: 180,00000 ,W: 240,00000))" expected: True but was: False

I've also tried

  cmp    eax,$ffff
  sete   al
  movzx  eax,al  

same issue :(

it seems something wrong with eax or the zero-flag

jdelauney commented 6 years ago

GT also failed

"OpGT:Sub2 is not GT " expected: True but was: False

dicepd commented 6 years ago

I just test my fixes for <> and GTE and GT in win64 and they seem to work fine

I just checked them in try that.

jdelauney commented 6 years ago

i've made changes same errors

dicepd commented 6 years ago

Just checked in finally had a merge issue I took no notice of when I pushed earler

jdelauney commented 6 years ago

Ok all is right now 👍

jdelauney commented 6 years ago

Always 4 functionals tests failed

dicepd commented 6 years ago

I am looking at Normalize I think I may have a way to make this hmg safe and be efficient.

jdelauney commented 6 years ago

Seems have some error in functionnal test case in anglebetween

  fs1 := XHmgVector.AngleBetween(-XYHmgVector,NullHmgPoint);
  AssertEquals('AngleBetween:Sub14 -X->XY failed ', (3*pi/4), fs1);
  fs1 := -XHmgVector.AngleBetween(XHmgVector,NullHmgPoint);
  AssertEquals('AngleBetween:Sub15 X->XY failed ', (pi), fs1);
  fs1 := -YHmgVector.AngleBetween(YHmgVector,NullHmgPoint);
  AssertEquals('AngleBetween:Sub16 X->XY failed ', (pi), fs1);
  fs1 := -ZHmgVector.AngleBetween(ZHmgVector,NullHmgPoint);
  AssertEquals('AngleBetween:Sub17 X->XY failed ', (pi), fs1);
jdelauney commented 6 years ago

For sub14 right test is

fs1 := XHmgVector.AngleBetween(-XYHmgVector,NullHmgPoint);

dicepd commented 6 years ago

So it is winding oriented? To me the bug looked like if negative then pi + value would give the ans not test much but would have to look at 120deg and similar to see what neg cos came out.

dicepd commented 6 years ago

Ok I can get Normalize working but I don't like the fact I have to put a jump in. I do like the getting rid of 2 long masks and hmg safe.

  movaps xmm0, [RDI]
  movaps xmm3, xmm0
  mulps  xmm3, xmm3
  mov eax, $3f800000      // integer equiv of 1.0
  movd xmm1, eax
  shufps xmm1, xmm1, $00
  movhlps xmm1, xmm3        //  |Z^2|*|1|1|
  haddps xmm3, xmm3
  addss  xmm1, xmm3         //  |x^2 + y^2 + z^2|*|1|1|
  sqrtps xmm1, xmm1         //  |Sqrt(x^2 + y^2 + z^2)|*|1|1|
  movd  eax, xmm1            // test for 0 divisor
  add   eax, eax                 // get rid of sign bit if it exists
  jz @origin
  shufps xmm1, xmm1, 11000000b   // |sqrt(Norm)|sqrt(Norm)|sqrt(Norm)|1|
  divps  xmm0, xmm1        // divide self by above and it does not matter what W is 0/1=0  1/1=1 34/1=34   
@origin:
  movhlps xmm1, xmm0                            

What do you think? This is only SSE3 atm.

dicepd commented 6 years ago

Ok ignore below previous run must have been a debug[ I do try not to do that] just reverted and now the code show reduction of about 2-3% from 5.4 to 5.23 for the safety check with hmg safe.

Ok please verify this yourself, before you make the above change for win64, do a timing test. I had SF of 1.46 from previous run before this change and now have a SF of 5.231 with the above Normalize code. Run times for native are the same in both tests. Rhough looking at my win64 box native is MUCH quicker on that intel processor. Now if that is compiler or processor I cannot say at this point.

So if we unroll loop (c/c++ term) in array we can get 5 items per load cycle for 64bit cpu with 3 register usage. plus I would not bother with the 0 divisor check for an array.

dicepd commented 6 years ago

Ok final code marginally faster than previous and safe


  // 3 reg usage so could unroll loops 5 at a time in 64bit
  movaps  xmm0, [RDI]
  movaps  xmm2, xmm0
  mulps   xmm2, xmm2
  movaps  xmm1, [RIP+cOneVector4f]
  movhlps xmm1, xmm2        //  |Z^2|*|1|1|
{$ifdef USE_ASM_SSE_3}
  haddps  xmm2, xmm2
{$else}
  addss   xmm1, xmm2         //    |z^2+x^2*|1|1|
  shufps  xmm2, xmm2, 01010101b
{$endif}
  addss   xmm1, xmm2         //  |x^2 + y^2 + z^2|*|1|1|
  sqrtps  xmm1, xmm1         //  |Sqrt(x^2 + y^2 + z^2)|*|1|1|
  movd    eax,  xmm1
  add     eax,  eax            // get rid of sign bit if it exists
  jz @origin
  shufps  xmm1, xmm1, 11000000b  // |sqrt(Norm)|sqrt(Norm)|sqrt(Norm)|1|
  divps   xmm0, xmm1
@origin:
  movhlps xmm1, xmm0          // sf 5.48          

``
jdelauney commented 6 years ago

Ok Peter i've made change Perp and Normalize are now ok but have always error with TGLZVector4f in functionals test with Combine2 and Combine3

"Combine2:Sub1 X failed " expected: <26> but was: <0> "Combine3:Sub1 X failed " expected: <46> but was: <0>

and also normal test

dicepd commented 6 years ago

i'll fire up the win64 box after lunch and have a look

dicepd commented 6 years ago

Fixed too many params to use nostackframe in Win64

jdelauney commented 6 years ago

Ok thanks Peter i didn't check the s file. Now all are green 👍