Closed dicepd closed 6 years ago
I've 7 functionals tests breaked : Normalize, NotEquals, GTE, GT, AngleBetween, Combine2 and Combine3
Yeah Just getting to some of them, done <>
Though Combines are working in unix64 Normalize, GTE GT and anglebetween are what I have broken.
got the GTE and GT bug fixed
The not equal is a hell
In test sub2 return false also if "vt2.Create(120,60,180,240);" sub1 passed and give correct result :
"OpNotEquals:Sub2 does not match ((X: 120,00000 ,Y: 60,00000 ,Z: 180,00000 ,W: 240,00000)<>(X: 120,00000 ,Y: 60,00000 ,Z: 180,00000 ,W: 240,00000))" expected: True but was: False
I've also tried
cmp eax,$ffff
sete al
movzx eax,al
same issue :(
it seems something wrong with eax or the zero-flag
GT also failed
"OpGT:Sub2 is not GT " expected: True but was: False
I just test my fixes for <> and GTE and GT in win64 and they seem to work fine
I just checked them in try that.
i've made changes same errors
Just checked in finally had a merge issue I took no notice of when I pushed earler
Ok all is right now 👍
Always 4 functionals tests failed
I am looking at Normalize I think I may have a way to make this hmg safe and be efficient.
Seems have some error in functionnal test case in anglebetween
fs1 := XHmgVector.AngleBetween(-XYHmgVector,NullHmgPoint);
AssertEquals('AngleBetween:Sub14 -X->XY failed ', (3*pi/4), fs1);
fs1 := -XHmgVector.AngleBetween(XHmgVector,NullHmgPoint);
AssertEquals('AngleBetween:Sub15 X->XY failed ', (pi), fs1);
fs1 := -YHmgVector.AngleBetween(YHmgVector,NullHmgPoint);
AssertEquals('AngleBetween:Sub16 X->XY failed ', (pi), fs1);
fs1 := -ZHmgVector.AngleBetween(ZHmgVector,NullHmgPoint);
AssertEquals('AngleBetween:Sub17 X->XY failed ', (pi), fs1);
For sub14 right test is
fs1 := XHmgVector.AngleBetween(-XYHmgVector,NullHmgPoint);
So it is winding oriented? To me the bug looked like if negative then pi + value would give the ans not test much but would have to look at 120deg and similar to see what neg cos came out.
Ok I can get Normalize working but I don't like the fact I have to put a jump in. I do like the getting rid of 2 long masks and hmg safe.
movaps xmm0, [RDI]
movaps xmm3, xmm0
mulps xmm3, xmm3
mov eax, $3f800000 // integer equiv of 1.0
movd xmm1, eax
shufps xmm1, xmm1, $00
movhlps xmm1, xmm3 // |Z^2|*|1|1|
haddps xmm3, xmm3
addss xmm1, xmm3 // |x^2 + y^2 + z^2|*|1|1|
sqrtps xmm1, xmm1 // |Sqrt(x^2 + y^2 + z^2)|*|1|1|
movd eax, xmm1 // test for 0 divisor
add eax, eax // get rid of sign bit if it exists
jz @origin
shufps xmm1, xmm1, 11000000b // |sqrt(Norm)|sqrt(Norm)|sqrt(Norm)|1|
divps xmm0, xmm1 // divide self by above and it does not matter what W is 0/1=0 1/1=1 34/1=34
@origin:
movhlps xmm1, xmm0
What do you think? This is only SSE3 atm.
Ok ignore below previous run must have been a debug[ I do try not to do that] just reverted and now the code show reduction of about 2-3% from 5.4 to 5.23 for the safety check with hmg safe.
Ok please verify this yourself, before you make the above change for win64, do a timing test. I had SF of 1.46 from previous run before this change and now have a SF of 5.231 with the above Normalize code. Run times for native are the same in both tests. Rhough looking at my win64 box native is MUCH quicker on that intel processor. Now if that is compiler or processor I cannot say at this point.
So if we unroll loop (c/c++ term) in array we can get 5 items per load cycle for 64bit cpu with 3 register usage. plus I would not bother with the 0 divisor check for an array.
Ok final code marginally faster than previous and safe
// 3 reg usage so could unroll loops 5 at a time in 64bit
movaps xmm0, [RDI]
movaps xmm2, xmm0
mulps xmm2, xmm2
movaps xmm1, [RIP+cOneVector4f]
movhlps xmm1, xmm2 // |Z^2|*|1|1|
{$ifdef USE_ASM_SSE_3}
haddps xmm2, xmm2
{$else}
addss xmm1, xmm2 // |z^2+x^2*|1|1|
shufps xmm2, xmm2, 01010101b
{$endif}
addss xmm1, xmm2 // |x^2 + y^2 + z^2|*|1|1|
sqrtps xmm1, xmm1 // |Sqrt(x^2 + y^2 + z^2)|*|1|1|
movd eax, xmm1
add eax, eax // get rid of sign bit if it exists
jz @origin
shufps xmm1, xmm1, 11000000b // |sqrt(Norm)|sqrt(Norm)|sqrt(Norm)|1|
divps xmm0, xmm1
@origin:
movhlps xmm1, xmm0 // sf 5.48
``
Ok Peter i've made change Perp and Normalize are now ok but have always error with TGLZVector4f in functionals test with Combine2 and Combine3
"Combine2:Sub1 X failed " expected: <26> but was: <0> "Combine3:Sub1 X failed " expected: <46> but was: <0>
and also normal test
i'll fire up the win64 box after lunch and have a look
Fixed too many params to use nostackframe in Win64
Ok thanks Peter i didn't check the s file. Now all are green 👍
Compared to AngleCosine which seems to work fine.