Closed dicepd closed 6 years ago
For 4x the addition of CreateScaleVec, CreatePoint[Vec] can make code more readable. I am not a fan of raising errors from this type of library, but if we use something like the following in pascal code only.
I'm agree with you to add those functions but for
class operator TGLZVector4f.+ (constref A, B : TGLZVector4f) : TGLZVector4f;
{$ifdef USE_STRICT_HOMOGENEOUS}
// code here to test for two points
// raise error if we have two points.
{$endif}
Adding more preprocessing command, i'm not a fan. I thought to integrate the routines for Vector3f (just cloning Vector4f ) and make changes for the W. in asm if we working just with 3 components we will lost performance and with the "AsVector3f" in Vector4f it will be easy to swap between Homogenous and affine
hmm testing Negate in 4f it has suddenly struck me that W should not be negated for Hmg.
Don't know what else that would affect. I wish that record inheritance was in this version, it could be overridden for Hmg
Ok i see 3 maners to implement affine vector :
movq xmm0, [A]
movss xmm1, [A]8
movlhps xmm0, xmm1
and for return
movhlps xmm1, xmm0
movq [Result], xmm0
movss [Result]8, xmm1
But those lines decrease performances a lot
class function TGLZVector3f.+(ConstRef A, B:TGLZvector3f):TGLZVector3f;
var
t1,t2 : THmgAffineVector; // or directly using TGLZVector4f but the problem TGLZVector3f is declared before TGLZVector4
begin
t1.CreatePointVec(A);
t2.CreatePointVec(B); //CreateScaleVec
t1 := t1 +t2;
Result := t1.AsVector3f; end;
**Solution 2B :**
Clone TGLZVector4f functions and manage the W direcly like you suggest but without anymore preprocessing command
* 3rd Solution : In TGLZVector4f add a Flag var like "FAsAffine: Boolean in private and declare property AsAffine:Boolean in public
Now it each function it will needed to test this flag. but don't know how have access to this var in asm
asm mov aReg, FAsAffine test aReg, $f Jnz @DoAsAffine .... jmp @toend @DoAsAffine : .... @toend: end;
Do you see another solutions ?
I think the better is the Solution 2B.
What do you think ?
Ok, I have been thinking somewhat on this issue. Mainly around memory alignment, usage and glarrays. I think it boils down to how we use glFloat4f v glFloat3f in arrays and the impact of
AFAIK most modern graphics cards expand 3f to 4f internally for exactly these memory alignment issues, and use homogeneous internally for rendering.
Another thought/ question I want to explore is the overhead, or hopefully lack of it, of properties for advanced record and the use of default property. This is a germ of an idea which may come to nothing.
More thinking required before rushing in here I think
Back on the 4f v 3f issue, 4f is quicker for us, we could compact to 3f but at what cost, and is that cost greater in cpu than the time it would take for a DMA transfer of 33% more bytes to the graphics card, which takes no cpu time at all as it is handled by the DMA outside the CPU cache zone.
A point to consider when thinking about CPU mem to GPU mem.
The more I think about this issue the more I think that 3f is really an optimisation that has had its day even with not so modern hardware.
in "modern opengl" dat are passed thrue a pointer so first for having aligned we need to have our own getmem/freemem functions. I've found this but not really nice, i found :
procedure sse_getmem (var p:pointer;size:dword);
var temppointer:pointer;
begin
getmem(temppointer,size+sizeof(pointer)+sse_align_mask);
qword(p):=qword(temppointer)+sizeof(pointer);
p:=align (p,sse_align);
ppointer(qword(p)-sizeof(pointer))^:=temppointer;
end;
procedure sse_freemem (var p:pointer);
var temppointer:pointer;
begin
temppointer:=ppointer(qword(p)-sizeof(pointer))^;
freemem (temppointer);
p:=nil;
end;
Yes i'm also prefer use vector4f. I've began a little to update GLScene and i'm completely remove affine but before test keep many many change to do before.
What exactly is the mem structure of a glarray? 1 Just the pointer to the start of the array and the size is passed in the call 1 or does it have 'leader' member for handle storage
If 1 then old fashioned new(pointer....) should suffice as we declared alignment with type. If 2 then yes need to alloc a larger lunp of mem and set the pointer to the appropiate boundary based on sizeof(handle);
The example given does not seem to cope well with realloc for enlarging or shrinking an existing block of mem. Whereas with new at least the mem manager in pascal hopefully has a better chance of not having to do a movemem if heap allows.
For either case this is probably a good place to use a generic class to wrap all the details of a glarray based on its growth size membersize etc and handle all of this stuff in one place then subclass for 3f 4f 4i ... variants. Advantage of generics is that calculation such as size+sizeof(pointer)+sse_align_mask
happens at compile time not run time.
where sse_align_mask becomes (not (sizeof(pointer) - 1)) for xxFFFF000xx or without the not xx00000FFFxx to make it cope with whatever size or mask type we need without const.
Also with a generic we could make it behave exactly like a pascal array and add for each semantics to it. Just look at native generic array to see what I mean and see how little would be needed to adapt that to a glarray.
Another though re generics,
TGLArray..... almost same as generic array but with handle storage.
TGLFloatArray ........ Add some way of indicating base type is float. constructor is pass through
method only class
procedure Scale(AValue: Single) ....... generic SSE scale all values whatever sub class is
{$ALL_ALIGN_16}
TGLFloat2Array specalize TGLFloatArray
Scale(AValue: TGLZVector2f).....
.......
{$ALL_ALIGN_16}
TGLFloat4Array specalize TGLFloatArray
Scale(AValue: TGLZVector4f)...... scale all
Possible hierarchy with no need for mem align in base class. Note I am still thinking in C++ STL here I don't know off the top of my head how doable this is in fpc May have to resort to pass through helpers in realised classes.
What exactly is the mem structure of a glarray ?
in modern opengl glarray are deprecated (as many others thing since OpenGL 3.3). Now for passing array of vector to the shader we use VAO and VBO buffer
this is an example of what i did some time ago (i've putted some extra comments) : This show you how to load data into the Modern OpenGL pipeline.
procedure TGLZCustomBufferSceneObject.DoPrepareBuffer;
var
BufferOffset:Integer;
// Loc:Integer;
// AnAttrib : TDGLShaderAttrib;
begin
if FNeedBuild then
begin
DoBuildMesh; // Build Mesh Datas (add vertex, indices, normals, textcoords....)
if FVBOHandle.Handle = 0 then //Prepare VBO Buffer
begin
FVBOHandle.AllocateHandle;
end
else
begin
FVBOHandle.DestroyHandle;
FVBOHandle.AllocateHandle;
end;
FVBOHandle.Bind; // Lock VBO Buffer
// Allocate VBO Buffer and Set Vertices Mesh''s Data Buffer
// Allocate Buffer in GPU for all DATAS (vertices+normals+texCoords+...
FVBOHandle.BufferData(nil,FMeshData.GetTotalDataSize,
cGLVBOMeshModetoGLEnum[FMeshData.MeshMode]);
// Load our buffer to GPU Buffer (Position, Sizeof_Buffer, TheBuffer
FVBOHandle.BufferSubData(0,FMeshData.GetVerticesDataSize, FMeshData.Vertices.List);
// Set properties of our data "array" it's the Key for manage our buffer
// see : https://www.khronos.org/opengl/wiki/GLAPI/glVertexAttribPointer
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, nil);
glEnableVertexAttribArray(0); // Set to GPU array index 0
if MeshData.UseNormals then
begin
BufferOffset:=FMeshData.GetVerticesDataSize;
glBufferSubData(GL_ARRAY_BUFFER,
BufferOffset,
FMeshData.GetNormalsDataSize,
FMeshData.Normals.List);
// FVBOHandle.BufferSubData(BufferOffset,FMeshData.GetNormalsDataSize, FMeshData.Normals.List);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, Pointer(BufferOffset));
glBufferData(GL_ARRAY_BUFFER,
FMeshData.GetNormalsDataSize,
FMeshData.Normals.List,
cGLVBOMeshModetoGLEnum[FMeshData.MeshMode]);
Loc:=FDefaultShader.GetAttribLocation('In_Normals');
FDefaultShader.b BindAttribLocation('In_Normals');
AnAttrib:=FdefaultShader.getAttribInfo('In_Normals');
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, nil);
glEnableVertexAttribArray(1);
end;
// repeat for texCoords, BiNormals, Tangents, Colors......
if FEBOHandle.Handle = 0 then //Prepare EBO Buffer
begin
FEBOHandle.AllocateHandle;
end;
FEBOHandle.Bind; //Lock EBO Buffer
FEBOHandle.BufferData(MeshData.Indices.List,MeshData.GetIndicesDataSize, cGLVBOMeshModetoGLEnum[FMeshData.MeshMode]); //Set EBO Buffer Data
FEBOHandle.UnBind; //UnLock EBO Buffer
FVBOHandle.UnBind; // UnLock VBO Buffer'
//@TODO Free MeshDatas ??? . The Datas are now in the GPU
FNeedBuild:=false;
end;
end;
So instead manipulating array we must do use Pointer or Pointer of an array.
For managing list, some objects from GLScene do exactly what we want and what you describe above with TGLFloat4Array specalize TGLFloatArray
The base of all list is a PByteArray In GLScene (see : TGLVectorList and the parents)
Like you can see glVertexAttribPointer is very important. And we can stride buffers :)
Actually, i've already begin to rewrite the base of a 3D engine. At first, before playing with opengl i 'd like to have a software rendering engine with the same pipeline as opengl with "pure pascal shader" for rasterize trianlges/screen. I'm on a good way for this. It will be a real test crash for the vectorlib
I will upload the code later. I'm need a little more time to give you something acceptable :) ? At this stage, normaly it's completely compatible with unix :)
An another thing, now that we all have our green test :) i'd like to implement in a first time just a simple function for manipulate Array of vector (offset, scale, rotate, normalize...) and making test with SSE/AVX without extra code elements (the procedure VectorArrayXXXX commented in GLZVectorMath unit)
I smell that will run test to Red :)
Others thing i never use specialize and generic list first because i don't like "the code become obscur" (lol) and the 2nd it reduce performance a lot rather than static or dynamic array
and the 2nd it reduce performance a lot rather than static or dynamic array
What I was suggesting was a generic wrapper around a pointer, most generic hide the pointer and expose their own methods, but you can use generics as inheritable pointers. For fast code you pass pointer for user base they can use nice friendly methods. I notice in GLScene you have 'fast' array methods using pointers to arrays. With a properly designed generic you would be using the pointer by default and the array access as an addon. Plus as all our routines are declared ConstRef they are already callable using a pointer.
As for strides they are easy with generics, Array of GenericArray with linked iterators, plus you can have methods like iterator pr := it.cur.prevoiusRow removes all the headache of grids in a contiguous array of type. glArray.setSize(100); glarray.colcount := 10; 10x10 grid glarray.colcount := 5; 5x20 grid. itn := cur.nextrow. // itn now hold pointer to item/stride in next row.
For tri gen these methods reduce bugcount dramatically.
I smell that will run test to Red :)
Well I have held off Matrix testing as nothing seems to work. As it uses a lot of underlying vector code, I wanted to ensure I could trust the vector code before I tackled matrix.
What I was suggesting was a generic wrapper around a pointer, most generic hide the pointer and expose their own methods, but you can use generics as inheritable pointers.
Ok, i'm little lost with the term "Generics"
So if i understand well the prototype could be :
Type
TGenericArray = class
private
FData : Pointer; //PByte
FColoCount : Integer;
FRowCount : Integer;
FCursor : Integer; // Where we are in array
FElementSize : Integer; // Size of one element in array
public
Constructor Create(Const aSize:Longword; aElementSize : Integer);overload;
procedure setSize(Const aSize:Longword);
function PrevRow : Pointer;
function NextRow: Pointer;
function GetRow(NumofRow : Integer):Pointer
property ColCount : Integer Read FColCount Write SetColCount;
property RowCount : Integer Read FRowCount Write SetRowCount;
property ElementSize : Integer Read FElementSize; // Write SetElementSize;
end;
TGLZVector4fArray = class(TGenericArray)
private
procedure SetItem(Const AValue : TGLZVector4f);
function GetItem(Const anIndex:Integer):TGLZVector4f;
protected
function GetItemPtr:PGLZVector4f; virtual;
public
Constructor Create;
procedure Scale(x: Single);
procedure Scale(x:TGLZVector4f);
property RowItem[aRow, ACol: Integer] : TGLZVector4f Read GetRowItem Write SetRowItem;
property Item[anIndex : LongWord]: TGLZVector4f read GetItem Write SetItem;
end;
ex generic :
var
AValue : Single;
glArray.Create(100,4);
glArray.ColCount := 10;
For I := 0 to glArray.RowCount-1 do
begin
ARow := glArray.NextRow; //glArray.GetRow(I);
for j:=0 to glArray.ColCount-1 do
begin
AValue :=PSingle(ARow)^;
AValue := AValue * AValue;
ARow^:=AValue;
Inc(ARow, glArray.ElementSize);
end;
end;
Dou you think at something like that ? if yes it will be easy to implement my "Fast Bitmap managment" use the same style.
I am a bit short of time atm till Saturday, I will see then if I can produce something along the lines of what I am talking about.
Finally after many distractions today.
Far from complete but gives an idea of a pointer vector array see Test procedure in GLZArrays and have a play with that
Just add these files to the test project for now they are nothing like the file layout for final, I just tried to keep it to two files. One to handle the core 'array' which had no deps on project and the other which introduced VectorMath.
Ok i've made something functionnal and a little bit more simple and generic from your code. Generic save us many code
But before i post the code. What do you think if we refine a little bit the structure of the library like this ?
or something like that ?
I had the idea that we could do libraries of functional blocks such as
Basically get these done and in libs so as we move on we are playing with new code with less building, and 'leave' working tested code behind. [I know we will be adding bits but I have my libs docked with my project so all files are accessible and lazarus just knows when it has to rebuild a lib if I edit it]
I know that GLScene seemed to have issues with sectioning code in this way, from a legacy and slow buildup of intertwined code to perceived problems in distribution.
With FPC and Lazarus the code directory thing handles the dependency ordering of projects it hosts, sort of a manifest of dependencies it will handle downloads for. So multiple libs are not a distribution or new user issue like they used to be.
Plus separating into libs generally forces you to make clearer separations in the functionality hierarchy from the beginning so you do end up with a cleaner object model that is easier to maintain in the longer term. [on one contract I had, I took six month to sort out one clients codebase, took his buildtime time from two days of pain, to push button B and auto.]
Just my thoughts. ;)
I had the idea that we could do libraries of functional blocks such as
- Utils, [profiler cpuinfo],
- BaseMath [sse vectors, matrix fastmath]
It's from root or a parent folder ?
Basically get these done and in libs so as we move on we are playing with new code with less building, and 'leave' working tested code behind. [I know we will be adding bits but I have my libs docked with my project so all files are accessible and lazarus just knows when it has to rebuild a lib if I edit it]
I'm understand, it's not a problem for me with my project
I know that GLScene seemed to have issues with sectioning code in this way, from a legacy and slow buildup of intertwined code to perceived problems in distribution.
This is why i've splitted back GLScene source into several sub folder. For can i maintain each part like sub lib more easly
With FPC and Lazarus the code directory thing handles the dependency ordering of projects it hosts, sort of a manifest of dependencies it will handle downloads for. So multiple libs are not a distribution or new user issue like they used to be.
Yes and for my project i'm going more deeply i've spit each "theme" to be independant with only one dependency "the common part" (included the vector lib, but i'm thinking i'm also going to split it into sub-lib). At final all my "sub-lib" have the same scheme for folders. And all can be merge into one big lib for final release.
Plus separating into libs generally forces you to make clearer separations in the functionality hierarchy from the beginning so you do end up with a cleaner object model that is easier to maintain in the longer tern.
Yes, it's like i'm working with my project
Just my thoughts. ;)
me to :)
So what the best for you ?
Split away in folders and we can decide on what to go in what lib later
Make those calls when we want to make a release.
Jerome,
I would like some way of making a sanity test for library code which uses homogeneous conventions. In the past I have spent many unhappy hours tracking down other peoples bugs in code which was trying to use homogeneous conventions. Bugs such as adding two points, scaling with a vector that has not had W set to 1, these being the two most common ones I encountered. Results always 'look' ok until you try to use them further down the line.
For 4x the addition of CreateScaleVec, CreatePoint[Vec] can make code more readable.
I am not a fan of raising errors from this type of library, but if we use something like the following in pascal code only.
Not only would it help in tracking down this type of bug but an end user could also use it to do a sanity check on code. I am sure you have suffered the same and get the idea.