ptitSeb / gl4es

GL4ES is a OpenGL 2.1/1.5 to GL ES 2.0/1.1 translation library, with support for Pandora, ODroid, OrangePI, CHIP, Raspberry PI, Android, Emscripten and AmigaOS4.
http://ptitseb.github.io/gl4es/
MIT License
685 stars 156 forks source link

Implement Precompile Shader Archive #117

Open ptitSeb opened 5 years ago

ptitSeb commented 5 years ago

When using GLES2 backend, every Fixed Pipeline Function (so OpenGL 1.x) can lead to the creation of a new shader program. Some can take a bit of time to compile and link (like when a lot of lights are involved), giving some "hicup" to a game. That long loading time can be seen when launching Foobillard++ or Neverball for example.

Because Fixed Pipeline Emulator always generate the same shaders program, thoses could be saved for later use: by creating an Archive containing past build FPE program, and using the GL_OES_program_binary extension to save / load the program binary and avoid the compiling and linking part.

The PSA will be available only if the extension is present, and if it support at least 1 format for binary programs.

On linux, the Archive will be save in the HOME folder, as a hidden file (named .gl4es.psa) On AmigaOS4, it will be in PROGDIR: as a hidden file (same name as linux)

TODO: Were to put the archive on Android TODO: Were to put the archive on Emscripten

ptitSeb commented 5 years ago

Implemented in commit 601184b1b7e63cd1f85b5360207bb763efa8fe82 and fixed with 5a189a1f30ae73a4849a2e4f3f18259e6484ec55. Disabled by default for now.

kas1e commented 5 years ago

While Daniel working on adding necessary functions to amigaos4 driver, i still did test how it all handles currently by settings LIBGL_NOPSA to 0.

So, i just run neverball and exit. As result, it create .gl4es.psa file in the root of game (good!), which is 43 bytes of size (so just header saying GL4ES PrecompiledShaderArchive + some bytes of structure format or something). When i exit from game i also have at end "LIBGL: Saved a PSA with 0 Precompiled Programs" (that of course expected, as there is no functions implemented).

Through, should't there be error or something (or log message) saying when running something like "LIBGL: Forcing to use PSA, but functions didn't work" , or something ? Or it make no sense ?

ptitSeb commented 5 years ago

Mmm, yes, the archive should not be created. I'll check that !

ptitSeb commented 5 years ago

@kas1e : with comit 37d3b629580f2289703d5ee83f701c38cabf08ef it should now ignore PSA if the extension is not present (you will have to delete yourself the previous empty one).

kas1e commented 5 years ago

Btw, did you test PSA on Pandora already ? I.e. with fricking shark while playing and loading of neverball for example. Interesting to know if on your side with pandora's gles2 all works as expected now ?

ptitSeb commented 5 years ago

I tested quickly on the Pandora, and it seems promising yes. Starting time for Neverball is much faster now. I need to do more testing, but that looks good (I haven't tried Friking Shark yet).

kas1e commented 5 years ago

Is pandora's gles2 driver also have needs to assembly from spirv format those precompiled shaders, or they already in the machine code and not in spirv saved in .gl4es.psa file ?

ptitSeb commented 5 years ago

No spirv on the pandora, it's already machine code specific to the PowerVR it use.

kas1e commented 5 years ago

So if then take in account that Hans will not do it in Nova, then maybe asking Daniel to save psa not in spirv, but in Nova's assembly code will make sense.. (If nova provide such conversion functions to public)

ptitSeb commented 5 years ago

yep

ptitSeb commented 5 years ago

But note that I don't think it's possible to access Nova compiled program shader if Hans don't write the equivalent of glProgramBinary(...) and glGetProgramBinary(...) into Nova. That's why Daniel proposed to do it using the SpirV way.

kas1e commented 5 years ago

But imho problem with those "micro-pauses-hickups" happens because there needs to convert things from one format to card specific code in realtime , and when we have precompiled shaders (and if they saved in the machine code), there will be no needs to convert anything, just send when need it. So, imho, if only ogles2 will have those functions, but shaders will be in machine code, then everything should be fine already, no ?:)

But, there is another issue arise : machine specific code probabaly will be specific also for different gfx cards. In case with Pandora as i understand there just one single gfx card all the time, so , all precompiled shaders on pandora will always working on all pandoras, while, in case with amigaos4 (or any other desktop os where we can change cards), we have different gfx card, which mean different machine code, which mean that we can't in end save machine-specific code, right ? And on pandora it works because one single card everywhere ?

From another side, it really doesn't matter if it will be different for different cards : it only will mean that we can't release with included .gl4es.psa , but one which will be generated for user on user's machine will be then later used with no probs, only drawback will be that first time will need to play with hickups..

ptitSeb commented 5 years ago

Yes, the .gl4es.psa is supposed to be build on the target computer, not bring by the app. That means the 1st run will still get the hicup, but other runs will not. It the same on the Pandora, there are 3 different model with slightly diferent hardware, and there are a lot of version of the driver user can choose from, so I don't plan to put any pre-baked gl4es.psa in games, and will one be build by itself.

kas1e commented 5 years ago

Aha got it.. So let's wait and see if implementation as Daniel do with "spirv" on top of nova conversion will give enough boost or not. If it will be enough, then our version will be even portable across different gfx card, but if not , then probabaly we will need ask Hans to provide public api to generate machine ready code.

kas1e commented 5 years ago

Hi :)

So Daniel send me first version with his implementation of necessary functions, that what readme says:

So, i run Neverball over new binary, and gl4es says me in output:

LIBGL: Extension GL_OES_get_program detected LIBGL: Numberof supported Program Binary Format : 1

And nothing more related to it.

So , after i exit from Neverball, i didn't have any .gl4es.psa file created. I run some DOS tracer (to see if it tries to create that file) and nope, file even didn't tries to be created.

Have any clue what to check next ?:)

ptitSeb commented 5 years ago

Do you see "LIBGL: Shuting down" at the end of the program at least?

The function close_gl4es() from src/gl/init.c at this end of the file, declared as a "destructor" should print it.

kas1e commented 5 years ago

Yeah of course i see that one, i just print only info relevant to PSA, but there is full log (supertuxkart0.6.2a, latest gl4es):

4/0.Work:games/supertuxkart> supertuxkart_gl4es_1916 
LIBGL: Initialising gl4es
LIBGL: v1.1.1 built on Jul 22 2019 23:35:10
LIBGL: Using GLES 2.0 backend
LIBGL: Using Warp3DNova.library v1 revision 65
LIBGL: Using OGLES2.library v2 revision 9
LIBGL: OGLES2 Library and Interface open successfuly
LIBGL: Targeting OpenGL 2.0
LIBGL: Forcing NPOT support by disabling MIPMAP support for NPOT textures 
LIBGL: Not trying to batch small subsequent glDrawXXXX
LIBGL: Current folder is:/Work/games/supertuxkart
Data files will be fetched from: '.'
Highscores will be saved in './.supertuxkart/highscore.data'.
LIBGL: Hardware test on current Context...
LIBGL: Hardware Limited NPOT detected and used
LIBGL: Extension GL_EXT_blend_minmax detected and used
LIBGL: FBO are in core, and so used
LIBGL: PointSprite are in core, and so used
LIBGL: CubeMap are in core, and so used
LIBGL: BlendColor is in core, and so used
LIBGL: Blend Substract is in core, and so used
LIBGL: Blend Function and Equation Separation is in core, and so used
LIBGL: Texture Mirrored Repeat is in core, and so used
LIBGL: Extension GL_OES_mapbuffer detected
LIBGL: Extension GL_OES_element_index_uint detected and used
LIBGL: Extension GL_OES_packed_depth_stencil detected and used
LIBGL: Extension GL_EXT_texture_format_BGRA8888 detected and used
LIBGL: Extension GL_OES_texture_float detected and used
LIBGL: high precision float in fragment shader available and used
LIBGL: Extension GL_EXT_frag_depth detected and used
LIBGL: Max vertex attrib: 16
LIBGL: Extension GL_OES_get_program detected
LIBGL: Number of supported Program Binary Format: 1
LIBGL: Max texture size: 16384
LIBGL: Max Varying Vector: 32
LIBGL: Texture Units: 8(8), Max lights: 8, Max planes: 6
LIBGL: Extension GL_EXT_texture_filter_anisotropic detected and used
LIBGL: Max Anisotropic filtering: 16
LIBGL: Hardware vendor is A-EON Technology Ltd. Written by Daniel 'Daytonta675x' MьЯener @ GoldenCode.eu
LIBGL: OGLES2 Library and Interface closed
LIBGL: Shuting down 
kas1e commented 5 years ago

I also asking Daniel to recheck new stuff too, he says that glGetProgramiv with GL_PROGRAM_BINARY_LENGTH_OES works and glGetProgramBinaryOES apparently works too.

EDIT: and glProgramBinaryOES works too

kas1e commented 5 years ago

Btw, measuring things by some test examples on os4 for now, we found that of course all the cached binaries should be preloaded right on running, because loading binaries from disk can slow the things down the same as before.

But i assume in case with gl4es, on begining it scaning for .gl4es.psa , and if it found and functions/extensions working, it then preload binaries to memory, right ?

ptitSeb commented 5 years ago

Check in https://github.com/ptitSeb/gl4es/blob/master/src/gl/init.c#L569 if the name of the PSA is correct, using some printf

ptitSeb commented 5 years ago

Btw, measuring things by some test examples on os4 for now, we found that of course all the cached binaries should be preloaded right on running, because loading binaries from disk can slow the things down the same as before.

But i assume in case with gl4es, on begining it scaning for .gl4es.psa , and if it found and functions/extensions working, it then preload binaries to memory, right ?

Yes. Loading of the PSA file is done at init of gl4es. Then, when a fpe shader need to be created, it ceck first in the PSA archive (in memory), and create the shader directly if present. If not present, the shader is created in the traditional way, and is added to the in-memory PSA (that is then flaged "dirty"). At shut down of gl4es, the PSA archive is writen back to disk if it's flagged dirty.

kas1e commented 5 years ago

Check in https://github.com/ptitSeb/gl4es/blob/master/src/gl/init.c#L569 if the name of the PSA is correct, using some printf

Something wrong with whole "If" started from if(hardext.prgbin_n>0) { line. Maybe missing } or something somewhere, because none printfs is reached there.

As i see for now , all that file creation thing inside of if(globals4es.nopsa==0) { } loops, which probabaly not what it should be ?

ptitSeb commented 5 years ago

look closely, it's nopsa , so it should be 0 to use psa.

kas1e commented 5 years ago

That how it looks like for me now:

    if (getcwd(cwd, sizeof(cwd))!= NULL)
        SHUT(LOGD("LIBGL: Current folder is:%s\n", cwd));

    printf("hardext.prgbin_n = %d\n", hardext.prgbin_n);

    printf("uuuuuuuuuuuuuu\n"); 
    SHUT(LOGD("OMEGA: 2232323232323ooooo\n"));

    if(hardext.prgbin_n>0) {
        env(LIBGL_NOPSA, globals4es.nopsa, "Don't use PrecompiledShaderArchive");
        printf("eeeeeeeeee\n");

        if(globals4es.nopsa==0) {
            printf("erewrerrrrrr\n");
            cwd[0]='\0';
            // TODO: What to do on ANDROID and EMSCRIPTEN?
#ifdef __linux__
            const char* home = getenv("HOME");
            if(home)
                strcpy(cwd, home);
            if(cwd[strlen(cwd)]!='/')
                strcat(cwd, "/");
#elif defined AMIGAOS4
            strcpy(cwd, "PROGDIR:");
#endif

            SHUT(LOGD("OMEGA: ooooooooooooo\n"));
            if(strlen(cwd)) {
                SHUT(LOGD("OMEGA: bbbbbbbbbbbbbb\n"));
                strcat(cwd, ".gl4es.psa");
                printf("cwd = %s\n", cwd);
                fpe_InitPSA(cwd);
                fpe_readPSA();
            }
        }
    }
}

And in log, i can see :

hardtext.prgbin_n = 0 uuuuuuuuuuuuuuuuu OMEGA:22232323232323oooooo

and no other prinfs

ptitSeb commented 5 years ago

Mmmm, wait, on AmigaOS, there is no pre-tests, so the prog_n is 0 by default. What is the version of GLES2 that have this extension available ?

kas1e commented 5 years ago

that one where those new functions added ? 2.9

kas1e commented 5 years ago

Or you mean GLES2 standard ?

ptitSeb commented 5 years ago

that one where those new functions added ? 2.9

Yes, that. Because I'll probably need to hardcode prog_n = 1 on AMIGA if OGLES2 driver >= 2.9 for now (until I find a way to create an offscreen context on Amiga so I can launch some test at start of gl4es).

kas1e commented 5 years ago

Yeah, let's do it that way. Or , you simple can assume that ogles2.9 is minimum for gl4es at all

ptitSeb commented 5 years ago

Mmm, wait, I have this in src/agl/agl.c I have

#define MIN_OGLES2_LIB_VERSION 1
#define MIN_OGLES2_LIB_REVISION 22

So, 1.9 doesn't work...

kas1e commented 5 years ago

current one are 2.9, so we can safely cahnge it to 2.9 in agl.c

ptitSeb commented 5 years ago

Ah ok. I'll do that.

kas1e commented 5 years ago

Probably it also enough in init.c, right before if(hardext.prgbin_n>0) { , add something like:

#ifdef __amigaos4__
hardext.prgbin_n=1
#endif

if(hardext.prgbin_n>0) {

?

EDIT: well nope, that not enough (crashes) :)

ptitSeb commented 5 years ago

I have made the change to enabled. Now maybe something is not working correctly in gl4es / ogles2. Like the aglGetProcAddress(...) ?

ptitSeb commented 5 years ago

Like, in src/agl/amiga.c change line 802 from return IOGLES2->aglGetProcAddress(name); to

void * ret = IOGLES2->aglGetProcAddress(name);
printf("aglGetProcAddress(%s) => %p\n",  name, ret);
return ret;

(warning, edited 2 times)

kas1e commented 5 years ago

Just #include "hardext.h" fail to find the file, so i change to #include "../glx/hardext.h" , and then builds fine.

But then by some reassons this line:

if (!(LOGLES2->lib_Version > 2 || (LOGLES2->lib_Version == 2 && LOGLES2->lib_Revision >= 9)))  {

Did't meet the requements. But version of library in use is cleary 2.9.

Then, even if i remove that line, and hardcore put just

hardext.prgbin_n = 1;

then nothing happens : i.e. no crash, but no file creates on exit. Like, its still again thin that hardext.prgbin is 0

kas1e commented 5 years ago

Requements fail because (!( , should be without ! there

kas1e commented 5 years ago

Its like "hardext.prgbin_n = 1;" doing in amigaos.c didn't enough and seems overwriten later by 0 again ?

ptitSeb commented 5 years ago

Yes, this should all be fixed now.

kas1e commented 5 years ago

Ok downloaded all from scratch of latest vesion : now hardext.prgbin_n 1 for sure, but then it crashes as before when i put the same hardext.prgbin_n = 1 in other place.

So i add what you say and it print that:

aglGetProcAddress(glGetProgramBinaryOES) => 0x7f45d264

I also tried to use debug version of new ogles2.library (for better stack trace), and result that is crashes inside of OGLES2's glGetProgramBinaryOES()

ptitSeb commented 5 years ago

So you see with Daniel?

kas1e commented 5 years ago

You mean its on our side for sure ?

Edit: i mean that function works in ogles2 for sure (we test it with native example).. And, should it be called, if we have no .gl4es.psa file created ?

ptitSeb commented 5 years ago

Yes this function will be called. It's the function used to retreive a Shader to be stored in the PSA archive. It works fine on the Pandora, and the function is not super complicated.

Look in gl4es_getProgramBinary(...) in src/gl/program.c

It works in two part: first the gles_glGetProgramiv(glprogram->id, GL_PROGRAM_BINARY_LENGTH_OES, &l); to get the size of the program (you should printf("Sizeof program = %d\n", l);) and then the actual gles_glGetProgramBinary(glprogram->id, l, length, format, *binary); to retreive the program...

kas1e commented 5 years ago

Ok, a wrote to Daniel about.. Did i understand right, that we use aglGetProcAddress() actually in gl4es only one time : exactly for glGetProgramBinaryOES() , or, it used all the time with other functions and works (so problem is them about glGetProgramBinaryOES() ? )

ptitSeb commented 5 years ago

It's the first time we used aglGetProcAddress(...) indeed, so it may be wrong.

You can test be comenting line 668 ( that is EX(glClear)) to force using aglGetProcAddress(...) also for glClear(...) to be sure.

kas1e commented 5 years ago

Yeah crashes in glClear now

kas1e commented 5 years ago

Interesting .. Maybe something in our amiga related code broken when we return not just NULL ?

edit: i see that we use os4GetProcAddress() before , maybe returning not NULL now broken things somewhere .. like in loader.h or so

ptitSeb commented 5 years ago

I don't know, maybe it's the call convention? I have no idea, and no Amiga to experiment. You should talk about that with Daniel.

One thing I have noted and is quite surprising, is that I remember that I cannot simply used the address of IOGLES2->glClear, but needed to wrap is in static void AmiglClear(...) and use that function address. So I'm not sure what address aglGetProcAddress(...) is sending, but it should not be the IOGLES2 one (that I think have more hidden parameters), but the internal one...

kas1e commented 5 years ago

Maybe that because of needs to pass parameters as well ?

I see for glColor we have before:

static void AmiglClear (GLbitfield mask) {
    return IOGLES2->glClear(mask);
}

Then EX(glClear) , which (and that define/redefine mess i don't understand for sure):

// Using glXXX name, return the function pointer of that function in ogles2 library
#define MAP(func_name, func) \
    if (strcmp(name, func_name) == 0) return (void *)Ami##func;

#define EX(func_name) MAP(#func_name, func_name)

Strange ... Maybe it needs * , not just

ptitSeb commented 5 years ago

That EX is a simple macro, and the point of the whole aglGetProcAddress(...) is to get a function pointer address for the function name given in parameter... So calling aglGetProcAddress("glClear") will give the address of static void AmiglClear (GLbitfield mask) that just call return IOGLES2->glClear(mask). No issue here, this is normal.