ptitSeb / gl4es

GL4ES is a OpenGL 2.1/1.5 to GL ES 2.0/1.1 translation library, with support for Pandora, ODroid, OrangePI, CHIP, Raspberry PI, Android, Emscripten and AmigaOS4.
http://ptitseb.github.io/gl4es/
MIT License
691 stars 158 forks source link

amigaos4: glDraw..... / GL_UNSIGNED_BYTE & normalization issues #61

Closed kas1e closed 6 years ago

kas1e commented 6 years ago

Some time ago we found some "hardcore" issue on amigaos4, which is seems to be or ogles2, or warp3d issue. But as our devs can't find roots of that issue easy, then probably that mean we need some more simple test cases there , so that can be analizied better. Hope ptitSeb can help there too, as always :)

So, issue is that in some apps (at this moment it is quake3 and irricht engine), something unknown happens, which lead to the total distortion of visuals. In quake3 it happend when we just enable Extensions (so, not just glbegin/glend route used, but glDrawElements). In Irrlicht it happens just as default (so probably they also use something like quake3 when extensions enabled).

There is how it looks like in quake3:

when menu should come: http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/first_run.jpg

in game itself: http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/ingame3.jpg

Ther is how it looks like in irrlicht engine 1.8.4 with their simple "hello world":

http://kas1e.mikendezign.com/aos4/gl4es/irrlicht/irrlicht_nohack.jpg

Of course in all other apps glDrawElements / glDrawArrays and stuff works fine, its just happens only at moment in those 2 cases.

We come only to suggestion that maybe it's the glVertexAttrib with 4GL_UNSIGNED_BYTE and Normalize TRUE that break things? It's use for the Colors (and it's converted to 4GL_FLOAT when using glBegin()/glEnd() code path).

So we do check that theory by making that patch:


To test this theory, you can modify gl4es: in src/gl/gl.c, in function glDrawElementsCommon line 1094 change:

if (p->enabled) gles_glColorPointer(p->size, p->type, p->stride, p->pointer); with

    void* tmp = NULL;
    if (p->enabled) {
        if(p->type==GL_UNSIGNED_BYTE) {
            if(!len) len = len_indices(sindices, iindices, count);
            tmp = copy_gl_pointer_color(p, 4, 0, len);
            gles_glColorPointer(4, GL_FLOAT, 0, tmp);
        } else
            gles_glColorPointer(p->size, p->type, p->stride, p->pointer);
    }

and line 1152 (after the insertion), before if(buffered) { add

    if(tmp) free(tmp);

And it make it works.

Then i send all the info to the Daniel (our ogles2 author), but nothing come up from it. He say that from his side is everything seems works correctly. He analize everything he can think of , and all looks like fine.

Through, lately, he add some info:

"as far as I remember it was for sure no issue with the uchar-normalization, AFAIR I ruled that out. It also doesnt make too much sense after all: the normalization can only affect non-float data (in case of Q3 only colors), so the worst you would get if there was sth. wrong with the normalization would be wrong colors, but not wrong geometry / wrong tex-coords.

Distortions like that could be caused by wrong client-data (either completely invalid RAM or alignment issues or wrong stride etc.) or a wrong VBO setup (either caused by wrong client data / config or an ogles2-internal-bug) or a Nova-bug with certain VBO or shader setups."

At this point as i underrstan, Daniel do all he can do having access to ogles2 only, and it seems it can be issue with warp3d in end.. But as Hans works on warp3d not that fast , seems we need some more simple test case for analize which will produce same effect.

Dunno, maybe just strip down whole quake source, just so it show only intro, and then menu and exit ? Then step by step reduce it too.. Or better to write some test case from scratch ? But then we still don't know what exactly cause issues ...

kas1e commented 6 years ago

@ptitSeb Btw, in our suggestion we have "glVertexAttrib with GL_UNSIGNED_BYTE .." , but didn't gl4es convert everything to 32 bits anyway before sending it to driver ?

ptitSeb commented 6 years ago

Well, yes, except for this one (and technically, it's only accepted when size=4, so RGBA color fit in a 32bits data space, but yeah, it's 8 bits data basically). GL_UNSIGNED_BYTE are no "endian-sensitive": there are the same in big endian and little endian, so it should not be an issue, unless GL_UNSIGNED_BYTE are indeed not supported by Warp driver because only 32bits data is implemented. In that case, either gl4es or OGLES2 should take care of the data. I can probably force a workaround in gl4es to be sure no GL_UNSIGNED_BYTE are used (of course, that will convert a few data, so it would be better if Warp accept 8bits datas, speed-wise).

Now that think of it, there is another case of non 32bits data: the elements indices used in glDrawElements can be GL_UNSIGNED_SHORT (so 16bits), but this seems supported or nothing would work.

kas1e commented 6 years ago

Hans answer some time ago that:

Warp3d driver currently can't handle anything other than 32-bit datatypes.

The problem is that Southern Islands and newer GPUs are little-endian only while our CPUs are big-endian. So the driver has to convert the endianness as its copied to the GPU. Right now it assumes that everything is 32-bit, and it probably returns an error if you try to use 8/16-bit vertex data.

Writing a system to perform the correct endianness swapping for the data is on the to-do list. It'll be a bitch to get right, because it's got to handle interleaved data, structures with different datatypes, etc. It's one of those things I wish I could get someone else to write...

Is it of any help for us ?:) Maybe we can try some simpe test-cases to see if it indeed 8bit vertex data behave that way ?

ptitSeb commented 6 years ago

Well, I understand Hans point of view about 16bits data. But 8bits are the same in big endian and little endian. So adding support for that should be straightforward.

So, here are the question to awnser to understand what the next step should be

kas1e commented 6 years ago

I will ask them about.. Through, we for now can't be sure that our issue because of that ? I mean, our previous workoround was more or less "it is or normalisation, or normalisation with ubyte, or something of that sort". Can we somehow reduce possible scenerios ?

ptitSeb commented 6 years ago

Well, the workaround I gave you earlier should cover 90% of the cases. But it will not cover the case with software using Shaders and VA with GL_UNSIGNED_BYTE directly. Still, for Quake3 and most software you already tried, it should work o ensure no 8bits data are used. Do you have issues with that workaround?

I'll work on a better workaround soon (I first need to add some infrastructure to better handle data conversion in VA, for all platform, then the AMIGAOS case will be easy to add).

kas1e commented 6 years ago

Plz wait a bit with working on, we seems to find something, need to clear it all a bit

ptitSeb commented 6 years ago

sure, don't worry, I have plenty of other stuff to try out...

kas1e commented 6 years ago

Good news ! Seems we deal with on our side ! There is some limitations in warp3d, and that what Hans says:

  1. The big endianness issue concerns VBOs containing data of mixed sizes. If you try mixing byte, 16-bit and/or 32-bit data in one VBO, then it will fail. However, stick a 16-bit index array in its own VBO (or a VBO with only 16-bit data), and it will work. I just checked the code, and that is handled correctly.

So the "32-bit data only" statement isn't strictly true any more, and hasn't been for a while.

NOTE: DBOs are still 32-bit only.

  1. The second limitation is that the driver used to treat all Vertex Attributes (VAs) as 32-bit floats. The latest beta sets the VA's attributes correctly now, so it'll treat ints as ints, uints as uints, etc.

This is where I got a myself confused because I thought the hardware would treat floats and ints differently. However, 32-bit ints and floats are handled the same way: they get passed on to the shader unchanged (i.e., int VAs must go to an int shader input). So, 32-bit int VAs have probably been working all along.

The latest beta also correctly sets the VA descriptor for 8 and 16-bit attributes, including whether it's normalized. So, it should work provided you restrict each VBO to having one data size only (8-bit data in one VBO, 16-bit data in another, etc.). I haven't tested that, though, because I'd completely forgotten that my endianness handler was a bit more sophisticated than "32-bits only." Let me know what happens if you try it...

So, after that, Daniel just add in ogles2 some code which internally convert every GL_UNSIGNED_BYTE VA for client memory usage. For safety he converting to float internally, to not relying on this new 32bit integer support, because he better prefer to do his own normalization and because that way it should work with previous Nova versions too.

So, in end of all i checked it all, and quake3 with extensisions , as well as irrlicht engine examples works now , yeah !

I also do check quake3 with enabled extensions (so gldrawelements in use), to see speed differences, and results are:

640x480

q3_minigl_sdl1: 90.8 fps q3_minigl_sdl2: 86 fps q3_gl4es_sdl1: 74 fps

800x600

q3_minigl_sdl1: 87.5 fps q3_minigl_sdl2: 83.1 fps q3_gl4es_sdl1: 72.2 fps

1024x768

q3_minigl_sdl1: 82.2 fps q3_minigl_sdl2: 76.9 fps q3_gl4es_sdl1: 68.5 fps

1600x1200

q3_minigl_sdl1: 67.5 fps q3_minigl_sdl2: 68.9 fps q3_gl4es_sdl1: 60.2 fps

As you can see q3 even with gldrawelements a little slower still.. But at least in 1600x1200 it almost on pair. Is there any possible ways to accelerate gldrawelements() mode ?

ptitSeb commented 6 years ago

Well, good news indeed.

Until (if?) GL_UNSIGNED_BYTE is handled by the hardware, using CPU to convert the data will introduce a slow down. As you see, the more HiRes (so more pressure on GPU, less on CPU), the less speed difference, so CPU used for Data conversion count here. Also, minigl is, I guess, made for Quake3 in mind and is probably heavily optimize for this engine. I don't it's completly fair to expect gl4es to be faster here (even when GL_UNSIGNED_BYTE handled by hardware). What gl4es will bring is more function, and faster speed when using advanced OpenGL functions (like TexGen or shaders), I don't think you'll see any speed advange of gl4es will using idTech3 based games, or simple game with low geometry and no complex opengl renderer. Remember Neverball is much faster with gl4es (and this one use TexGen). You may see also benefit from gl4es on SeriousEngine or maybe TORCS also (or SpeedDream too). Foobillard++ also should works better on gl4es (if it even works with minigl).

kas1e commented 6 years ago

Checked the differences between our previous workaround in gl4es, and daniel's one done in ogles2 : daniel's one is faster in q3 for about 3fps.. Probably he just use less of memcopy routines (or they smaller or something?). Probably that also mean that when it will be done in hardware (in warp3d), it will give us another few fps more..

Btw, is initiall VBO support you add to gl4es some time ago works? I mean can we be sure it works at all ? I just tried to enable it to test with q3 / gldrawelements() way, and while i have in shell output that words "LIBGL: VBO used (in a few cases)" , i have no differences in q3 in terms of speed at all. I mean no single percent, which make me think it may not work ?

ptitSeb commented 6 years ago

Well, the piece of code of the GL_UNSIGNED_BYTE was just something fast to test, not optimized or anything... But yeah, once handled in hardware, that should gives a few fps.

About VBO. I'm not sure. It was a quick hack, that I will probably remove at some point (and try to implement proper VBO handling). Even if it works, don't expect any speedboost as the VBO is only created for 1 glDrawElements, so nothing usefull here.

kas1e commented 6 years ago

Probably we can close that issue for now.. Thanks for help !

ptitSeb commented 6 years ago

Hey @kas1e , I was just wondering: did you released Neverball with gl4es and If yes, what are the feedbacks?

kas1e commented 6 years ago

I still waiting when latest warp3d and ogles2 will be released in public , as those fixes we have lately all in private-betatest state, and users didnt have them.. So once last ogles2 and warp3d will be released i can also release all gl4es based apps :)

ptitSeb commented 6 years ago

Ah. Seems long. Do you why the fixes haven't been released yet? Working on more fixes or it just take time?

kas1e commented 6 years ago

Its just company who own all this, release per some time some "enhancer pack" (like service pack for winxp), where they put all the stuff their devs works on (drivers, libs, devices, apps, tools, etc). So it usually take some time. As far as i aware it should be released "very soon". But very soon can mean as 2 weeks, same as few months :)

ptitSeb commented 6 years ago

Ok, I see. Thanks for the info. Let's wait...

kas1e commented 5 years ago

Hi ,

For now we have that workaroud in the ogles2.library by Daniel, but Hans want to add necessary conversion code to the Warp3d itself (so its will be more right, and maybe will make things be a bit faster).

That what Daniel wrote about that workaround when he made it:


Now, Hans trying to implemnt conversion (endian swap) code in Warp3d itself, and in latest vesrion he have added support VBOs with mixed data sizes (e.g., 8, 16 and 32-bit vertex attributes in one VBO)

Through he have issues with q3 still (when we trying to use ogles2.library without workaround) : Menu, etc, all is fine, but in the game itself we still have mess. Its differend kind of mess that before while this wasn't implemented (should i made a video, to show what i mean and for better understanding ?), but still game didn't renders correctly.

After some debugging, the last mail which i have from Hans some days ago was:


Last time i checked, Q3 itself doesn't use VBOs, but packing data into VBOs must be happening elsewhere.

But i've finally figured out what's going on: something upstream from Warp3DNova is writing data into VBOs in a different layout to the one declared.

On a hunch I temporarily made it 32-bit endian swap uint8 data, and all the triangles were drawn correctly (but with the wrong colours). Q3 has VBOs with two layouts, e.g.:

W3DN_SI.library (6): VBO 0x5C399318 has data of mixed sizes (e.g., 16 and 32-bit values). Setting up conversion table. W3DN_SI.library (8): Building endianness conversion table for 4 interleaved arrays W3DN_SI.library (10): Endianness conv: offset: 0, count: 3, type: float32 W3DN_SI.library (10): Endianness conv: offset: 12, count: 4, type: uint8 W3DN_SI.library (10): Endianness conv: offset: 16, count: 2, type: float32 W3DN_SI.library (10): Endianness conv: offset: 24, count: 2, type: float32 W3DN_SI.library (10): Endianness conv series: offset: 0, convCount: 4, stride: 32, blockCount: 703, size: 22496 W3DN_SI.library (6): VBO 0x5C398518 has data of mixed sizes (e.g., 16 and 32-bit values). Setting up conversion table. W3DN_SI.library (8): Building endianness conversion table for 3 interleaved arrays W3DN_SI.library (10): Endianness conv: offset: 0, count: 3, type: float32 W3DN_SI.library (10): Endianness conv: offset: 12, count: 4, type: uint8 W3DN_SI.library (10): Endianness conv: offset: 16, count: 2, type: float32 W3DN_SI.library (10): Endianness conv series: offset: 0, convCount: 3, stride: 24, blockCount: 998, size: 23952

Clearly something is writing data in one of the two formats above to a VBO that has been declared as having the other format. Based on testing, it's most likely that it's writing stride == 32 data to a stride == 24 VBO. With the strides out of sync, it's converted incorrectly. Either way, it's a bug upstream (i.e., ogles2.library or GL4ES).

In otther words. What's for now is happening, is Warp3DNova is told that a VBO has a particular layout, and then data with a different layout is copied to it. As a result, the endianness conversion is wrong.

I of course wrote to Hans back, that as if everything works fine on other platforms, then we can't blame gl4es at all, as well, as if Daniel's workaround works fine in ogles2.library, then, its for sure on our side (or still warp3d have some problems, or ogles2.library doing something (or not doing) ).

But then i have no answer on it, and as well have no answer from Daniel about. But Daniel married week ago, and Hans will be tomorrow, so i can't expect them to answer fast :)

Anyway, what do you think about ? Maybe you have some ideas ..

Thanks !

ptitSeb commented 5 years ago

Mmm, I'll add some logging of the VBO stuff, with stride and some detail and make a run in quake3, to compare with what Hans has seen.

ptitSeb commented 5 years ago

@kas1e : do you use LIBGL_USEVBO=1 ? If no, that gl4es makes no use of any vbo If yes, then I have to check that code (but this function is not really usefull and should not be used IMO).

kas1e commented 5 years ago

I use only what gl4es use by default, i.e. set no special settings..

ptitSeb commented 5 years ago

Yeah, so those VBO are created by OGLES2 driver... Nothing I can do at this point :( (also, I wonder if those VBO are important, performance wise).

kas1e commented 5 years ago

Or they some internal VBO's of warp3d .. I wrote to Hans about, still waiting answer from Daniel.

For sake of tests i also trying to use LIBGL_USEVBO=1 , but that make no differences. Same problem, and no single differences about FPS. But if i remember right, that code which enabled when we do use LIBGL_USEVBO was some fast hack for tests, and onl y trying to do VBOs in some cases only.. Through, while i can understand why it make no differences in speed when no gl extensions enabled (as used glBegin/glEnd route), i don't understand why when we enable gl extensions (and so glDrawElements), it speed up nothing, even on 1 fps .. Maybe it just not real VBO's code now, but some emulated, so will make no differences ?

ptitSeb commented 5 years ago

Yeah, the VBO code in gl4es is not trigger often, so it's pretty useless.

What do you call "enable gl extensions": I mean, how do you do that? Hacking in the code or using r_primitives in config file?

kas1e commented 5 years ago

Just by enabling in the menu of q3 doing "on" for enable gl extensions , and restart. But its the same as r_primitives in config file, yep.

ptitSeb commented 5 years ago

In the console log, check what it is using for drawing, you can see inidividual glIndexArray or glInterleavedArray (I don't remember the exact wording) or a single glDrawElements (or is it glDrawArrays).

kas1e commented 5 years ago

You mean to enable gl extensions in q3 , and to check what functions it use when extensions enabled ?

If so, then, when NO gl extensions enabled (so it should use pure glBegin/glEnd route) we have:

rendering primitivies: multiple glDrawElement compiled vertex arrays: disabled

Then, when i eanble gl extensions in the q3 (so it should use gldrawelements), we have:

rendering primitives: single glDrawElement compiled vertex arrays: enabled

In theory when it use gl extensions (so as from console "single gldrawelement") it should probably use your VBO code when LIBGL_USEVBO=1 ?

ptitSeb commented 5 years ago

Well, between multiple glDrawElements and single glDrawElements, don't expect much differences! Yeah the single glDrawElements should trigger the VBO code of gl4es (probably, I have to check), but it will not make any real difference here.

Really, that compiled vertex arrays extension is not usable by gl4es (but I guess miniGL does use it). What it does is tell the opengl driver that the vertex data (and only the vertex data) are set and will not change between glLockArrays(...) and glUnlockArrays() so a opengl driver that don't have hardware transform can transform the vertices... But has gl4es use Hardware T&L (in shaders), it's just useless. And quake3 make changes to other arrays (colors, textures UV) in between, so I cannot really build anything stable...

kas1e commented 5 years ago

At least when i do fps benchmark over enabled and disable gl extensions i have good difference . Like for disabled it give about 45 fps , and with enabled about 70

ptitSeb commented 5 years ago

oh really ? that much ? Strange, I wouldn't have expected going from "multiple glDrawElements" to "single glDrawElements" to be that different. I was more expecting that kind of diffence between using glBegin/glEnd compare to glDrawElements...

ptitSeb commented 5 years ago

Anyway, about VBO, I don't think they are coming from gl4es.

kas1e commented 5 years ago

Something in wrong in our compare :) I mean, when in console log of q3 we have "multiple glDrawElements", that is when i do disable gl_extensions, which, in turn, mean, it should be pure glBegin/glEnd route. Why it wrote in console log "multiplu glDrawElements", i do not know, but that when glBegin/glEnd route works.

And then, when i enable glextensions, so, it use glDrawElements, it then wrote in q3 output "single glDrawElement".

But why it wrote "multiple glDrawElements", when we have disabled gl extensions (and glBegin/glEnd route should be in), i do not know.

ptitSeb commented 5 years ago

If I remember correctly, there is actually 3 drawing path in quake3 engine games:

  1. with glBegin(..) / glEnd()
  2. one with multiple glDrawElements(...) where it tries to make stripes out of triangles
  3. one with a single glDrawElements(...) where it just draw the triangles "as-is"

You can control that with r_primitive (0, 1, 2) in the cfg. The gl_extension allow the use of any GL extension, on of them is the glLockArray to make the r_primitive=2 default (all this from memory, I haven't rechecked the code).

kas1e commented 5 years ago

Ok.. doing some more tests via config file, so:

For first i set all extensions to 0 , and only play with r_primitivies:

seta r_primitivies "0" : 45.3 fps, in console writen "multiply glDrawElements".

seta r_primitivies "1" : 45.3 fps, in console writen "multiply glDrawElements".

seta r_primitivies "2" : 56.1 fps, in console writen "single glDrawElement".

So, for first as can be seen they even for glBegin/glEnd route, wrote "multiply glDrawElements". For second we can see that only swap from "multiply" to "single" give us 10fps+. (so that when we swap from glBegin/glEnd route to glDrawElements).

Then, i tried to allow gl_extensions, but enable only vertex_buffer_object one:

seta r_primitivies "0" : 56.1 fps, in console writen "single glDrawElements".

seta r_primitivies "1" : 45.3 fps, in console writen "multiply glDrawElements".

seta r_primitivies "2" : 56.1 fps, in console writen "single glDrawElements".

What it all mean, that seems compiled vertex arrays do nothing usefull ! Strange !

Then for sake of tests, i enabled compressed textures, with r_primitivies 2 , and it give me: same 56.1. Then i enable multitexture extension, and then it give good boost ! 68,3 fps ! Then i tried to add texture_env_add, and it nothing.

So ..what it mean, that compiled vertex arrays do nothing ! Really strange. Daniel was sure it will speed things up for us. Another strange thing, is that multitexture give about 10fps + as well !

Daniel all the time say that once compiler vertex arrays will works, it will give us huge speed up. But seems that all the speed up we have now , it's just from multitexture extension and swapping from glBegin/glEnd route to glDrawElements.

Weird ..

kas1e commented 5 years ago

And doing some more interesting tests to prove that minigl version works faster only because that "compiler vertex array" extensions works : i use r_primitivies 2 (so single gldrawelement) , and only enable multitexture extensions, both version and minigl one, and gl4es one, give me about 70fps (gl4es one 68, minigl one 70), and then, i enable vertex compiler arrays extensions ,and while gl4es one give me the same 70 fps , minigl give 82.

What mean that this vertex arrays extensions do nothing in gl4es, and that was our misledading with Daniel and Hans before, as we was sure that once we will fix distortion mess, things will works a lot faster, but then, if it doing nothing, then no surprise that its give no boost.. Surprise that we even almost on the same pair as minigl :)

Is it possible to made something so that extension will works in gl4es and made something usefull ?

ptitSeb commented 5 years ago

The glLockArrays extension, no, unfortunately. I already tried a few things, but it's really only good when you have to transform the vertices in full software, I have not been able to do anything usefull with it.

ptitSeb commented 5 years ago

But again, they are many other engine that will give you better performances with gl4es than with miniGL (and also, more functions / effects).

kas1e commented 5 years ago

Yeah, q3 is too oldschool .. Anyway, sorry for being dumb, but did i understand right, that extension called "compiled vertex array" in q3 are glLockArray thing ?

Doesnt you mind if i will discuss it with Hans and Daniel, and if they will have any ideas of how implement it i can bring it at you, and maybe we can made sonething helpfull ? One head is good, but 3 is better :) if of course you want to spend any time on..

ptitSeb commented 5 years ago

Yep. Look at the official spec here: https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_compiled_vertex_array.txt

You can discuss it of course, and I'll be glad If you find an idea on how to use this extension for something usefull!

ptitSeb commented 5 years ago

The problem with this extension is that you can enable Vertex Arrays after the Locking... Making this extension pretty unusable.

Lokk here for example: https://github.com/ptitSeb/ioq3/blob/master/code/renderergl1/tr_shade.c#L1279

You see the call to glLockArrays(...) and then the engine enable GL_TEXTURE_COORD_ARRAY and GL_COLOR_ARRAY. That means those 2 arrays are not Locked, but still used for drawing (and the values of thoses 2 arrays will be changed between the Lock and Unlock)...

kas1e commented 5 years ago

You mean "can't" enable" ? But how it works on non-shader implementations of different opengls ? Or you mean it can't in gl4es, because of how gl4es structured and works ?

ptitSeb commented 5 years ago

No I mean CAN enable. The client software can enable Arrays that are not locked, so that make the Lock/Unlock mecanism useless (for gl4es at least), because a drawing command is part on locked Arrays (like Vertex coordinates) and part on unlocked Arrays (like vertex colors or UV).

kas1e commented 5 years ago

Hi! At moment i have got answer from Hans only, and dunno how helpfull it can be, but that what he bring:

A few possible ideas:

  1. Put the vertex position data into non-interleaved arrays in a VBO. Then you can update the colour & texture coordinates more efficiently (better with caches and non need to upload the positions again)
  2. Put the vertex positions in a separate VBO. Again, you'll be able to update the colour & texture coordinates more efficiently
ptitSeb commented 5 years ago

No sure I understand but I do:

  1. Create a VBO with vertex position, color, texcoords, normals, but non interleaved. And only update changed color / texcoords.
  2. Create a VBO with only vertex position. Color and Texcoords and Normals out of the VBO.

Well, (1) I can probably try to implement, but that seems to be quite some work, and I'm unsure of the performances gain. Plus I don't know what vertex attributes (color, how many texcoords, normals) will be needed for drawing. For (2) I'm unsure how standard this thing is: some vertex attributes in a VBO and some in another VBO. While I can probably try to implement that, again, I'm unsure of the performances gain (as you still need to transfert colors and other VA) and unsure how various GLESv2 driver will accept this kind of things.

Has this optimisation are only for old engine, and those engines probably are running pretty well on most hardware already, I don't think it's worth the risk of slowing other stuff down, and not worth the added complexity of the code.

kas1e commented 5 years ago

For (1): Yes. Something like glBufferSubData() can be used to upload just the changed attributes.

For (2): The remaining attributes would go into a separate VBO in this variation. Basically, you'd have one VBO for static data, and one VBO for the dynamic ones (with appropriate usage hints: GL_STATIC_DRAW and GL_STREAM_DRAW). This option would probably work best on OpenGL implementations where the driver can put GL_STREAM_DRAW buffers in GART space (which is on the to-do list for Nova).

This is an entirely valid and normal way to use VBOs, see this link:

https://www.khronos.org/opengl/wiki/Vertex_Specification_Best_Practices#Formatting_VBO_Data

As for slowing things down: Hans doubt it'll slow stuff down because drivers for GLES2 level hardware use VBOs internally for the data, anyway. With option #2, you're actually giving the driver the hints to optimize each VBO for the data being sent.

All in all for first tests we can make it as experemental feature enabled via environment for testing purposes..

ptitSeb commented 5 years ago

Mmm ok. I'll look at the separate VBO then, to change the current (quite inefective) VBO stuffs I implemented some time ago and see if this could be implemented. It should be easier to do (VBO for all vertex attrib active at the time of glLock, other vertex attrib will remain outside).

ptitSeb commented 5 years ago

@kas1e : I have just pushed a change in gl4es: it will now try to use real VBO to optimize glLockArrays(...). You may get some slight speedup in Quake3 engine games...

ptitSeb commented 5 years ago

And I have pushed another change, with a sloght change of strategie that could help efficiancy. All this needs testing now (with Quake3, and any other game that use glLockArrays(...)/glLockArraysEXT(...)).

kas1e commented 5 years ago

Tested quake3, and sadly to say its the other way around : it 10 fps slower.

I.e. version of gl4es from week ago give me in 1024x768 92fps. But latest today's version give me in the same 1024x768 : 81fps. I.e. on 11 fps less.

Maybe some debug output left somewhere or kind ?