Closed jdelauney closed 6 years ago
umainform.pas(291,23) Error: identifier idents no member "ST"
MoveTo(P.ST.x,P.ST.y);
Ok found it, nothing is visible but it is doing 16.4 fps according to the titlebar
Ok finally got it working by using Mainform.Canvas direct. Can run in both 64 bit native and 64 bit SSE.
Though not much speedup.
Ok a bit more testing and yes when I get the setting right to max cpu, turn cadencer down, I get more fps with SSE, but only about 20%, but I suspect most of the time is spent drawing rather than in the AnimateScene.
Can run in both 64 bit native and 64 bit SSE. Ok a bit more testing and yes when I get the setting right to max cpu, turn cadencer down, I get more fps with SSE, but only about 20%
Ouch !!!! In win64 with SSE i'm only see a straight line from topleft to bottomleft (same case with direct Canvas)
What do you do ?????
FBitmapBuffer.canvas.Brush.color:=clBlack;
FBitmapBuffer.canvas.FillRect(clientrect);
mainform.Canvas.Clear; <-------
for i:=0 to maxboidz do
begin
b := FBoidz[i];
p := b.Round;
//P.X:=Round(FBoidz[i].ST.X);
//P.Y:=Round(FBoidz[i].ST.Y);
//calcul de la direction de déplacement pour la couleur
c:=round(ArcTan2(b.UV.X,b.UV.Y)*180/cPi)+180;
//c:=345;
// CurColor := FBitmapBuffer.ColorManager.Palette.Colors[c].Value;
CurColor := FColorMap[c];
// with FBitmapBuffer.Canvas do
with MainForm.Canvas do <---------
begin
Pen.Style := psSolid;
Pen.Color := CurColor;
// dessine un traits de la longueur de la vitesse
MoveTo(P.ST.x,P.ST.y);
LineTo(P.ST.x+P.UV.x,P.ST.y+P.UV.y);
end;
end;
// MainForm.Refresh;
(* With FBitmapBuffer.Canvas do
The only code change to see the swarm. Apart from settings for engine, only other thing I changed was I added -Sv and -O3 in the options.
What's the hell with windows, for once time ????
On 1srt what i can see without "-dUSE_ASM_SSE_3" options, and with on 2dn image
and with your change the windows still empty nothing is shown
it's silly !!!!!!!
Let me get my windows box going, I'll get back to you if I find something.
And now what i see with my own bitmap management without SSE3 and With
the 2nd it the same out also with you're changes
It's really really strange !!!! I'll add a log to see the results of operations
Ok now by checking the Range checking" (-Cr) in debug options this is what i see :
But see the FPS :around 1.5 !!!!!!! so something is wrong or something happens under win64 with current SSE code
i've just found this : https://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz
perhaps a beginning of answer
With my poor english i don't understand all well :(
My windows box just sees a stripe from top left to bottom right too. It is as if something is clamping it to x = y scaled to window size.
Check your win64 vector4f round and trunc they are so wrong! It now works for me in win64 after fixing those up.
So i've play a bit with MXSCR register and the problem seems to come from the TGLZVector2f./
a SIGFPE is raise on the line : divps xmm0, xmm1
. After replace by native code it seems also have a problem with TGLZVector4f. But what ???? need more deep search
Ok you're right Peter the round function cause the problem
function TGLZVector4f.Round: TGLZVector4i;assembler;nostackframe;register;
asm
// Rounding mode defaults to round-to-nearest
movaps xmm0, [RCX]
cvtps2dq xmm0, xmm0
movdqa [RDX], xmm0
end;
Don't say how to fix it...... but with Native code, always have the same error i've described before with the TGLZVector2f./
Ok finally
function TGLZVector4f.Round: TGLZVector4i;assembler;//nostackframe;register;
asm
// Rounding mode defaults to round-to-nearest
movaps xmm0, [RCX]
cvtps2dq xmm0, xmm0
movdqa [Result], xmm0
end;
is working. But to have BoidZ work i need to change this :
class operator TGLZVector2f./(constref A, B: TGLZVector2f): TGLZVector2f; assembler; nostackframe; register;
asm
movq xmm0, [A]
movq xmm1, [B]
divps xmm0, xmm1 // SIGFPE raise here
movq [Result], {%H-}xmm0
end;
by
class operator TGLZVector2f./(constref A, B: TGLZVector2f): TGLZVector2f;
begin
result.x := a.x /b.x;
result.y := a.y /b.y;
end;
Ok i found the error is in the function above in SSE the divisor for Hi is not set and equal to 0 so it's normal a SIGFPE raised so the trick
class operator TGLZVector2f./(constref A, B: TGLZVector2f): TGLZVector2f; assembler; //nostackframe; register;
asm
movq xmm0, [A]
movq xmm1, [B]
movlhps xmm1,xmm1 //--- Fill upper register
divps xmm0, xmm1
movq RAX, xmm0
end;
now is working gain is not high without asm the average FPS is 18.55 with SSE enabled FPS is 19,97
Ok I profiled this with valgrind and tbh I am surprised we got any speedup. less than 15% of time in calls to SSE code but CheckAngleofView is the killer for this, really bad design as it ends up doing Math.ArcTan2 through another call, so there is a whole stackframe around a call to a native pascal FPU function. And this is called a lot. then directly afterwards we have a call to SSE lengthSqr. In fact this one call to LengthSquare is the bulk of the calls to the SSE library.
So all in all probably not a good choice for a speedup demo.
So a switch to the FastArcTangent2 and 18fps becomes 46fps
After stripping out all the profiling and setting a window size(fps changes with size) I get ~ 33fps native and 42fps SSE
Ok after some minor changes with
I've made a little demo for testing our Vectors
Results
c:=round(ArcTan2(b.UV.X,b.UV.Y)*180/cPi)+180;
I think the problem is due on some out of range and surely cause from bad data alignment
Must see what is wrong,, but i don't really say how to debug and trace these "badass" behaviours