jdelauney / SIMD-VectorMath-UnitTest

For testing asm SIMD (SSE/SSE 2/SSE 3/SSE 4.x / AVX /AVX 2) vector math library (2f, 4f, matrix, quaternion...) with Lazarus and FreePascal Compiler
Mozilla Public License 2.0
8 stars 0 forks source link

Changes for 3.1.1 fpc #30

Open dicepd opened 6 years ago

dicepd commented 6 years ago

Jerome, Take a look at the old thread in forum,

https://forum.lazarus.freepascal.org/index.php/topic,32741.180.html

and this code change

https://bugs.freepascal.org/view.php?id=32781

I will update my trunk version and see what impact this will have. I know that we are targeting 3.0.4 but some people use trunk. I suppose it will mean a few more ifdefs scattered around the assembler inc files.

jdelauney commented 6 years ago

Good news it will impact on perfomance. I also see we can add Align 16 at end of record. This will save us (all) {$CODEALIGN}

dicepd commented 6 years ago

missing {$i ../../glzscene_options.inc} working round it in unix but can't test win64 as I expect that one to compile direct from github (my sanity check on a check in)

jdelauney commented 6 years ago

Oups sorry i forgot to erase this line. I'll take more care next time.

jdelauney commented 6 years ago

All is done and green, sorry again.

so for vectorclass from what i'm read. For having proper alignement we need do :

{$push}
{$CODEALIGN RECORDMIN=16}
{$PACKRECORDS C}
type  TM128 = record    
  case Byte of     
  0: (M128_F32: array[0..3] of Single);     
 1: (M128_F64: array[0..1] of Double);  end;
{$pop}  

TVector4f = packed record    
  case Byte of      
    0: (M128: TM128);      
    1: (X, Y, Z, W: Single); 
 end;

and we need add

vectorcall; assembler; nostackframe;

instead of

assembler; nostackframe;register;

what the best way ? do

{$IFDEF USE_VECTORCLASS} function myFunc(Constref a,b : TGLZVector): TGLZVector; vectorcall; assembler; nostackframe; {$ELSE}....

or clone the includes files ? how do you see the thing ?

dicepd commented 6 years ago

Are you following the thread? I would hold off on this for now, at least until CuriousKit stabilises the code base. It is very early days. Though messing about with this is eating my time. 3.1.1 will be a long time coming. They have only just released 3.0.4 and fpc releases do not happen very often.

If we wish to play with this I would suggest a branch for now just so we can play around with some code and see the effects.

It has one big bug in that we cannot see what/how the parameters are being used/caled in a function.

This makes life very difficult when trying to debug, create new routines etc.

The other bug it has is losing alignment. It is not just a unix thing either CK has had instances of it in win64. This could have us chasing non existent bugs or even worse losing efficiency where we should not need to.

dicepd commented 6 years ago

ok further to the forum posts

if we do this

{$MODESWITCH ADVANCEDRECORDS}
{$ASMMODE Intel}
{$align 16}
type
  { TM128 }
  TVector4fType = array[0..3] of Single;

  {$push}
  {$CODEALIGN RECORDMIN=16}
  {$PACKRECORDS C}
  TM128 = record
    case Byte of
      0: (M128_F32: array[0..3] of Single);
      1: (M128_F64: array[0..1] of Double);
  end;
  {$pop}

  TVector4f =  record
    public
    class operator +(A, B: TVector4f): TVector4f; vectorcall;

  case Byte of
    0: (V: TM128);
    1: (X, Y, Z, W: Single);
    2: (Red, Green, Blue, Alpha: Single);
    3: (Left, Top, Right, Bottom: Single);

  end;

I get the sort of results we need.

dicepd commented 6 years ago

Ok tried modding our source to reflect the above but basically the tests fall over on any call like

  movaps    xmm1, XMMWORD PTR [RIP+cOneVector4f]

consts are not aligned

jdelauney commented 6 years ago

Are you following the thread?

Not since Valentine's day. I'm just finished read it

I would hold off on this for now, at least until CuriousKit stabilises the code base. It is very early days. Though messing about with this is eating my time. 3.1.1 will be a long time coming. They have only just released 3.0.4 and fpc releases do not happen very often.

I'm agree with you

ok further to the forum posts. if we do this....

Ok, I ask myself several questions.

Ok tried modding our source to reflect the above but basically the tests fall over on any call like

It answer to my questions. for now we must stay as is. And wait until CK and Florian find the way for alignement and calling convention. If i 'm well understanding.

dicepd commented 6 years ago

Right I think I cured most of the alignment by removing the {$PACKRECORDS C}

Now I am coming up against Self being RCX instead of RDI in unix. This breaks SYSV calling convention completly and will have to be fixed. So maybe not that far away.

As for your questions

With the code above, are performances improved ?

Probably

Is the code more stable ?

Chasing trunk is not something I think we should be doing in the main, as a sideline maybe, helping CK yes, I will devote some time to this as longer term it will be better.

Should we apply this scheme now ?

No I think we should get our stuff working and worry about next fpc version as it arises.

jdelauney commented 6 years ago

I think we should be doing in the main, as a sideline maybe, helping CK

I don't understand very well "in the main as a sideline". Like independant functionnals tests, inside our lib ?

No I think we should get our stuff working and worry about next fpc version as it arises.

Yes i agree, It will be better to get a full functionnal lib

Now we need wait until RCX/RDI calling and alignement are fixed. If i'm right.

dicepd commented 6 years ago

I have a completely separate installation of trunk along with a copy of our stuff. I am not mixing these at all with current code. I will continue to monitor the changes until such a stage that at least trunk does not break what we have already, and then take another look. There looks to be more changes planned by CK, the most notable which he has in his tests is the ability to possibly inline some SSE constructs.

That would be by far the biggest saving IF he can make that work. Otherwise this is just a background task of making sure the next version will not break us too horribly went the time comes we need to take notice.

jdelauney commented 6 years ago

Yes is the better way.

I've installed Trunk on windows. And also make a separate clone of our code. And i'm already impressed by FPC 3.1. I tested some of my demos just like this. As is the compiler it's really better in most case it improved performances. The FPS grow up around between 8 to 12fps. I can not wait to see the possibilties with Inline SSE and all improvements Ck will do.

I hope also the changes not break our code. If we'll just need delete some 'Movaps' and change the structure like in his examples this should go.

I'm only afraid it's for differences between nix and win

dicepd commented 6 years ago

RE Vectorhelper Reflect What does this function do? Is N meant to be a plane, a normal to a plane? The math does not immediately tell me what its function is, not something I recognise off the top of my head.

jdelauney commented 6 years ago

RE Vectorhelper Reflect What does this function do?

I Specifies the incident vector. N Specifies the normal vector.

Description from https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/reflect.xhtml For a given incident vector I and surface normal N reflect returns the reflection direction calculated as I - 2.0 dot(N, I) N.

N should be normalized in order to achieve the desired result.

This function seems working i using it in the raymarching phong function this a part of the code :

// Camera Position
 cam.CreateAffine(0.0, 0.0, -5);  
 // We will need to shoot a ray from our camera's position through each pixel.  To do this,
 // we will exploit the uv variable we calculated earlier, which describes the pixel we are
 // currently rendering, and make that our direction vector.
 t.CreateAffine(uv.x, uv.y, 1.0);
 FCameraDirection := t.normalize; 

// Light Position
 MyShader.LightPosition.CreateAffine( 2.0, 4.5, -2.0);

// Standard Blinn lighting model.
  // This model computes the diffuse and specular components of the final surface color.
  function ComputeBlinnPhongLighting(MaterialColor, pointOnSurface, surfaceNormal, lightPosition, cameraPosition:TGLZVector4f):TGLZColorVector;
  var
    {$CODEALIGN VARMIN=16}
    fromPointToLight, fromPointToCamera,
    diffuseColor,
    reflectedLightVector,
    finalColor,
    specularColor,
    t : TGLZVector4f;
    {$CODEALIGN VARMIN=4}
    f,diffuseStrength, specularStrength : Single;

  begin
    t := lightPosition - pointOnSurface;
    fromPointToLight := t.Normalize;
    f := surfaceNormal.DotProduct(fromPointToLight);
    diffuseStrength := clamp(f, 0.0, 1.0 );

    diffuseColor := MaterialColor * diffuseStrength;
    t:= -fromPointToLight;
    t:= t.Reflect(surfaceNormal);
    reflectedLightVector := t.normalize;
    t :=  cameraPosition - pointOnSurface;
    fromPointToCamera := t.Normalize;
    specularStrength := pow( clamp( reflectedLightVector.DotProduct(fromPointToCamera), 0.0, 1.0 ), 10.0 );
    // Ensure that there is no specular lighting when there is no diffuse lighting.
    specularStrength := min( diffuseStrength, specularStrength );
    specularColor.Create(specularStrength);

    finalColor := diffuseColor + specularColor;
    finalColor.Alpha :=1.0;
    result := finalColor;
  end; 

Result in 640x480

2018-02-19_145711

dicepd commented 6 years ago

Ok now I know N is normal Vector I can write test.