ptitSeb / gl4es

GL4ES is a OpenGL 2.1/1.5 to GL ES 2.0/1.1 translation library, with support for Pandora, ODroid, OrangePI, CHIP, Raspberry PI, Android, Emscripten and AmigaOS4.
http://ptitseb.github.io/gl4es/
MIT License
702 stars 159 forks source link

amigaos4: glDraw..... / GL_UNSIGNED_BYTE & normalization issues #61

Closed kas1e closed 6 years ago

kas1e commented 6 years ago

Some time ago we found some "hardcore" issue on amigaos4, which is seems to be or ogles2, or warp3d issue. But as our devs can't find roots of that issue easy, then probably that mean we need some more simple test cases there , so that can be analizied better. Hope ptitSeb can help there too, as always :)

So, issue is that in some apps (at this moment it is quake3 and irricht engine), something unknown happens, which lead to the total distortion of visuals. In quake3 it happend when we just enable Extensions (so, not just glbegin/glend route used, but glDrawElements). In Irrlicht it happens just as default (so probably they also use something like quake3 when extensions enabled).

There is how it looks like in quake3:

when menu should come: http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/first_run.jpg

in game itself: http://kas1e.mikendezign.com/aos4/gl4es/games/quake3/ingame3.jpg

Ther is how it looks like in irrlicht engine 1.8.4 with their simple "hello world":

http://kas1e.mikendezign.com/aos4/gl4es/irrlicht/irrlicht_nohack.jpg

Of course in all other apps glDrawElements / glDrawArrays and stuff works fine, its just happens only at moment in those 2 cases.

We come only to suggestion that maybe it's the glVertexAttrib with 4GL_UNSIGNED_BYTE and Normalize TRUE that break things? It's use for the Colors (and it's converted to 4GL_FLOAT when using glBegin()/glEnd() code path).

So we do check that theory by making that patch:


To test this theory, you can modify gl4es: in src/gl/gl.c, in function glDrawElementsCommon line 1094 change:

if (p->enabled) gles_glColorPointer(p->size, p->type, p->stride, p->pointer); with

    void* tmp = NULL;
    if (p->enabled) {
        if(p->type==GL_UNSIGNED_BYTE) {
            if(!len) len = len_indices(sindices, iindices, count);
            tmp = copy_gl_pointer_color(p, 4, 0, len);
            gles_glColorPointer(4, GL_FLOAT, 0, tmp);
        } else
            gles_glColorPointer(p->size, p->type, p->stride, p->pointer);
    }

and line 1152 (after the insertion), before if(buffered) { add

    if(tmp) free(tmp);

And it make it works.

Then i send all the info to the Daniel (our ogles2 author), but nothing come up from it. He say that from his side is everything seems works correctly. He analize everything he can think of , and all looks like fine.

Through, lately, he add some info:

"as far as I remember it was for sure no issue with the uchar-normalization, AFAIR I ruled that out. It also doesnt make too much sense after all: the normalization can only affect non-float data (in case of Q3 only colors), so the worst you would get if there was sth. wrong with the normalization would be wrong colors, but not wrong geometry / wrong tex-coords.

Distortions like that could be caused by wrong client-data (either completely invalid RAM or alignment issues or wrong stride etc.) or a wrong VBO setup (either caused by wrong client data / config or an ogles2-internal-bug) or a Nova-bug with certain VBO or shader setups."

At this point as i underrstan, Daniel do all he can do having access to ogles2 only, and it seems it can be issue with warp3d in end.. But as Hans works on warp3d not that fast , seems we need some more simple test case for analize which will produce same effect.

Dunno, maybe just strip down whole quake source, just so it show only intro, and then menu and exit ? Then step by step reduce it too.. Or better to write some test case from scratch ? But then we still don't know what exactly cause issues ...

kas1e commented 5 years ago

For sake of tests i tried to play with new LIBGL_USEVBO. When i set it to 1 (in use), i have 81 fps. When i set it to 0 (so to not use), i have my 92fps back.

So from side of handling environment all fine there, just something going wrong inside of code itself seems so ?

ptitSeb commented 5 years ago

Lower speed? Can you check with Daniel if this is expected? The rendering loop is basically the same, with the only difference that the vertices data is in an actual VBO (instead of default PC Memory). The VBO will be refilled with data the same way default memory is, so maybe some optimisation that are in the default memory are not present in VBO?

kas1e commented 5 years ago

Yeah, speed is lower on 10 fps for sure (retested again). I wrote to Daniel, but probabaly he sleep now, so need to wait until tomorrow

kas1e commented 5 years ago

I may try also quake3 on linux, with and without gl4es, as well as with new gl4es which use VBO, so to see how it will behave (so we can know what happens better)

kas1e commented 5 years ago

Ok, tested on linux. Results are:

mesa: 502 fps gl4es before DBO addon : 421 fps gl4es with DBO addon : 297

As can be seen, with DBO is start to be slower on 120 fps even on linux.

kas1e commented 5 years ago

And i rechecked just via LIBGL_USEVBO, when it 1 , then fps drops to 300 fps, when its to 0, then fps back to 420.

ptitSeb commented 5 years ago

Ok, thanks for testing. I'll try to do some tests on my side to see if I can understand where does come that speed loss.

ptitSeb commented 5 years ago

So, I tried to optimized the process. It's a small bit better, but still slower than without any VBO (and I did some GLES2 capture to be sure the drawing call were consistant). So now, it needs LIBGL_USEVBO=2 to be activated. And LIBGL_USEVBO=1 will be use to try use real VBO when OpenGL VBO are used, but it's not implemented yet.

kas1e commented 5 years ago

Ok, another try, there is:

---x86/linux---

mesa:           466, 466, 469
gl4es_VBO 0 :   411, 409, 406
gl4es_VBO 1:    404, 403, 405
gl4es_VBO 2:    278, 276, 272

---ppc/amigaos4---

minigl:         101, 102, 101
VBO 0 :          91 , 92, 92
VBO 1 :          91 , 92, 92
VBO 2 :          82 , 83, 83

So as can be seen, the best results still VBO 0, while VBO 1 almost the same now (at least, on amigaos4, on linux a very little slower than VBO 0). And VBO 2 on linux are slower a lot, and on amigaos4 seems so too (at least from what i happens to see before it stops and start to bring bugs about opengl error).

kas1e commented 5 years ago

Probabaly i can test it with Irrlicht now too ? Or there no point at moment, as it about glLockArrays only for now ?

kas1e commented 5 years ago

With your last fix for amigaos4 , all start to works with VBO 2, too, but speed the same slower as on linux (i update previous figures).

ptitSeb commented 5 years ago

Ok. At least it works now. But I have no explanation to why it's slower... Ask Daniel, but yeah, no way to use that glLockArrays(...) for optimisation...

You can try with Irrlicht, but I don't remember if it use glLockArrays (and that will not help the Terrain sample that use VBO).

kas1e commented 5 years ago

Ask Daniel, but yeah, no way to use that glLockArrays(...) for optimisation...

Strange that it also slower on Linux too (even more than on amigaos4).. On amigaos4 we loose just about 10% of speed, but on linux we loose about 30-40% of speed when trying to use VBO.. I can understand if it on linux will be faster than without VBO, but only slow-problems happens on amigaos4 (as on amigaos4 all those drivers are guilty all the time for bugs and speed loss), but when it same on linux .. Dunno, maybe its just we can't use VBOs with glLockArrays at all as it loose time for something, for which it didn't when no VBO in use..

Will ask Daniel maybe he will have some ideas about

ptitSeb commented 5 years ago

Well, the problem with Quake3 is that the rendering will be "composite": the vertices coordinates will be in a VBO (so probably a good thing, especially for Amiga), but Colors and Textures will not be in that same VBO, and it's by design in the game engine. That means the graphic driver will have to mix fetching data from VBO and outside of VBO (or in another VBO in AmigaOS4 case I guess), which is probably not as optimized as fetching data all from the same place... Some other games use glLockArrays, if the whole data are "locked", that may be benificial.

Anyway, that feature, even if desapointing, is also a step in the direction of handling VBO with actual VBO (and that should help the Terrain rendering and some other games). Also, I have plan to create VBO (if possible) for glLIst(...) that should help games like foobillards++

kas1e commented 5 years ago

Yeah, that sure good progress.

It just can mean that usage of real VBOs with glLockArrays() didn't help by any reassons (maybe on MESA they don't use it here, for example, by any of reassons). But it will help with some other functions maybe..

But before i was under impression, that once real VBO in use, everything will speed things up a lot, but it seems all again depend on situations. With some it will help, with some not.

I also see that you start to add support of real vbo in whole, how you plan to do so ? I mean, will it be some list of functions with which it can be used, or, it will be in whole for all functions (include glLockArrays ?)

ptitSeb commented 5 years ago

For some of VBO, I will create a real VBO and maintain the data (that's the part I just wrote). Then, when the program will setup its data, if the VBO is still valid, I'll use the real one instead of the emulated. I cannot use real VBO all the time, because some data needs transformation and so VBO is not used. Also, OpenGL 2.x (and 3.x) allow more type of VBO then GLES2 support, so I need to filter what I can use for real, and what needs to be emulated. Once finished, this should help the Terrain sample, along with some games (look for glBindBuffer(...) or glBufferData(...) to see if a game use VBO.

kas1e commented 5 years ago

I also have a note from Daniel about, he say that yep, usually having all vertex data interleaved in one VBO is to be preferred. However a multi. VbO setup should still be faster than not using VBOs, (usually). But yes, display lists are perfect VBO candidates. Although it's probably will be not trivial for gl4es to 100% correctly support that. E.g. a glColor prior to list rendering may influence some (!) or all of the vertices if there was no glColor at the very beginning of the list...

kas1e commented 5 years ago

Btw, is it worth to start checking with latest commit about "Add preliminary work for for real VBO", or its too early ?

ptitSeb commented 5 years ago

Btw, is it worth to start checking with latest commit about "Add preliminary work for for real VBO", or its too early ?

It's too early. It will just create and maintain the VBO, but never use it for now. I'll tell you when to start testing.

ptitSeb commented 5 years ago

So, I have pushed a first commit to actually use VBO. It seems to be working fine on my side on the few games I tried (on the Terrain Irrlicht example, I got a 50% speed increase, so not bad). Only some GL_ARRAY_BUFFER are treated as real VBO. Not yet GL_ELEMENT_ARRAY_BUFFER, but that one should not bring that much improvment.