ptitSeb / gl4es

GL4ES is a OpenGL 2.1/1.5 to GL ES 2.0/1.1 translation library, with support for Pandora, ODroid, OrangePI, CHIP, Raspberry PI, Android, Emscripten and AmigaOS4.
http://ptitseb.github.io/gl4es/
MIT License
702 stars 159 forks source link

IrrLicht engine #53

Closed kas1e closed 6 years ago

kas1e commented 6 years ago

Hi ptitSeb ! :)

Sorry for bother you with another issue which very well maybe not related to gl4es itself, but while waiting for fixes in our drivers in amigaos4, i give a go and port IrrLicht engine as well over gl4es. So all compiles, links fine. But once i run some simple test case (which works of course via software rendering, etc), it crashes in the AmglGetIntergerv().

I.e. it should then come with GLSL checking, and have words "GLSL not available" (at least that i have on legacy opengl), or available (probably that it should be with gl4es ?). But instead it crashes:

4/0.Work:irrlicht/bin/> 01.HelloWorld_gl4es LIBGL: Initialising gl4es LIBGL: v1.0.5 built on Mar 2 2018 01:33:59 LIBGL: Using GLES 2.0 backend LIBGL: OGLES2 Library and Interface open successfuly LIBGL: Hardware test disabled, nothing activated... init_matrix(0x6b0b1bb0) LIBGL: Targeting OpenGL 2.0 LIBGL: Current folder is:/Work/irrlicht/bin/ Irrlicht Engine version 1.9.0 SDL initialized SDL Version 1.2.15 Using renderer: OpenGL 2.0 GL4ES wrapper: ptitSeb OpenGL driver version is 1.2 or better. << CRASH>>

I am almost sure, that its again some problems in our ogles2 driver (as i can see in log, that it crashes in the "AmiglGetIntegerv()", but maybe (only maybe), it can be something in gl4es as well ? Its even didn't throw any debug output from gl4es, as it crashes seems too early.

There is crashlog: http://kas1e.mikendezign.com/aos4/gl4es/irrlicht/crashlog_irrlicht_helloworld.txt

Maybe you have some ideas what it can be .. Thanks !

kas1e commented 6 years ago

Strange, crash still here. And it didn't seems like crashes in that function, it crashes later, in the game.

All i can see, is that window showups, then closes immediately, and crash come just once first gl function in use (in my case it was glViewPort() , but that probably doen't matter, as window just closes).

Probably those "varargs" can be issues. As i see, in ogles2 it defines like this:

define aglCreateContextTags(errcode, ...) IOGLES2->aglCreateContextTags((errcode), __VA_ARGS__)

And in interface like this:

void VARARGS68K APICALL (aglCreateContextTags)(struct OGLES2IFace Self, ULONG errcode, ...);

Those VARARGS68k and __VA_ARGS__ can be different than usuall stdarg.h ones, but can be wrong..

ptitSeb commented 6 years ago

Yes, maybe.

This code is only for Amiga, so I can adapt. What is the correct way to use vaarg on amigaos4?

kas1e commented 6 years ago

I always have hard times with that VARARGs stuff and necessarity of their use.. But i found that: http://www.os4coding.net/forum/varargs68k

ptitSeb commented 6 years ago

Ah ok, I'll adapt code tonight then.

kas1e commented 6 years ago

Thanks a bunch !

ptitSeb commented 6 years ago

So, if you replace the code by this one:

void* VARARGS68K aglCreateContextTags(ULONG * errcode, ...) {
    void* ret = NULL;
    if(IOGLES2) {
        va_list args;
        va_startlinear(args,errcode);
        ret = IOGLES2->aglCreateContextTags(errcode, va_getlinearva(args, struct TagItem *));
        va_end(args);
    }
    return ret;
}

Does it compile (and if yes, does it run)?

ptitSeb commented 6 years ago

Also, I'm unsure what header is needed for those. Does #include <amiga_compiler.h> is needed and is enough (to replace the #include <stdarg.h>)

kas1e commented 6 years ago

If i use the same , then it compiles, but make no difference for final result : the same crash. If i just comment out stdarg.h, and put instead amiga_compiler.h, then it cry about undeclares:

src/agl/agl.c: In function 'aglCreateContextTags': src/agl/agl.c:104: error: 'va_list' undeclared (first use in this function) src/agl/agl.c:104: error: (Each undeclared identifier is reported only once src/agl/agl.c:104: error: for each function it appears in.) src/agl/agl.c:104: error: expected ';' before 'args' src/agl/agl.c:105: warning: implicit declaration of function 'va_start' src/agl/agl.c:105: error: 'args' undeclared (first use in this function) src/agl/agl.c:107: warning: implicit declaration of function 'va_end'

ptitSeb commented 6 years ago

Mmm, then I don't know. I don't have an AmigaOS to experiment on my side :(

I'm afraid you need to ask help of other for this issue.

kas1e commented 6 years ago

Why i start to worry about it, it just because of some experiment, to see, if it will squahs some issue i have now.

Issue is some very strange one, and looks like some memory trashing coming from gl4es (or , the way how it added to SDL1 for os4 side).

Issue its quite strange. Check this code: https://github.com/kas1e/SDL/blob/SDL-1.2gl4es/src/video/amigaos4/SDL_os4gl.c

You can see there how currently it all done for SDL1 / GL4ES. Then, once i build Cadog game with that line : dprintf("Initializing GL4ES->OGLES2..\n"); (right before context creation). I didn't have in Cadog title picture at start ! (quite strange).

But once i comment out that prinfs, and build Cadog with it, then TitlePic from cadog going back !

That all point me on some memory trashing issues , and so i start to experiment : thinking that maybe because i call create of context from IOGLES2 in SDL, while opening of library itself happens in GL4ES and it maybe somehow "not shared enough" beetween or something..

That why i think that "maybe trying to swap it on createcontext from agl.c , just to see if there will be differences", and so i see it crashes..

ptitSeb commented 6 years ago

Is it the same dprintf as in Linux? Because in that case you need a file descriptor: http://manpagesfr.free.fr/man/man3/dprintf.3.html and ineed as-is, it will not run.

Other then that, I don't see anything wrong in the code. I'll look at it this week end (won't have much time tonight and tomorrow).

kas1e commented 6 years ago

Its not necessary dprintf should be there. If i even put there pure printf("aaaa\n"); right before context creation, then i have no title pic in Cadog. Once i comment it out, title pic is back :)

I do some more tests, and found, that if i do :

printf("a\n"); or printf("aa\n"); or printf("aaa\n"); : title pic still here. But once i do more than 3 "aaa", i.e. even just printf("aaaa\n"); : then no title pic.

That imho cleary point on memory trashing ?

ptitSeb commented 6 years ago

Yeah, could be.

Can you use the other function to create context, the one without the VAARG stuff? something like

struct TagItems tags[] = {
                OGLES2_CCT_WINDOW,(ULONG)hidden->win,
                OGLES2_CCT_DEPTH,16,
                OGLES2_CCT_STENCIL,8,
                OGLES2_CCT_VSYNC,0,
                OGLES2_CCT_SINGLE_GET_ERROR_MODE,1,
                OGLES2_CCT_RESIZE_VIEWPORT, TRUE,
            TAG_DONE, TAG_DONE };
hidden->IGL=IOGLES2->aglCreateContext(0, tags);

should work I guess.

kas1e commented 6 years ago

You mean call from SDL still as IOGLES2->, or as one from agl.c ? But i will try both ways anyway

kas1e commented 6 years ago

Tried both variant:

printf("aaaa"); hidden->IGL=IOGLES2->aglCreateContext(0, tags);

In that case in Cadog i have no background picture.

Then tried:

printf("aaaa"); hidden->IGL=aglCreateContext(0, tags);

In that case, cadog background picture there !

Then, i build letter's fall with new (working) variant. And trashing of menus (if you remember i show some screenshots before), almost gone ! They still here , but surely change the way how it all looks like.

That can only mean, that with new way of calling creating of context, we just a bit "shift" memory trashing issue , and that one probably come from gl4es , or, the way how we add aos4 backend inside of gl4es.

That prove the point about which Daniel told me before : he check a lot gles2 library, and all the time he come to consclusion , that there is memory trashing somewhere, which cause those effects with q3 and with irrlich engine (and with letters fall). Just with our fix , we only shift the issue.

Probably it hides somewhere in the amigaos.c or agl.c or in any other #ifdef amigaos4 place ... Uhm, quite strange !

kas1e commented 6 years ago

Maybe it can be some names conflicts, like we have aglFunctions , 1:1 the same named as those ones we call from IOGLES2->. Maybe it worth to change them all inside agl.c on something like gl4es_aglCreateContextTags, gl4es_aglSwapBuffers, etc ? By this way it will be undestadable that those ones are gl4es ones, and for sure will not inherit with IOGLES2 ones.

Probably that not the case anyway for memory trashing issues, but still will looks better.

ptitSeb commented 6 years ago

I don't understand why you think this test proves memory corruption comes from gl4es (and I'm pretty sure it doesn't). The aglCreateContext is not really a gl4es function. It just wrap the call the the actual agl function for OGLES2.

And I don't think there is a name conflict here. It would just not work with a name conflict, plus there is no conflict because function from OGLES2 are from a structure and not gl4es ones.

ptitSeb commented 6 years ago

Now, if you are still unsure of the conflicting name, simply don't build agl.c and only use OGLES2 functions (for creation of context and swapbuffer) for testing...

If you want to be sure noting of gl4es is loaded at start, rebuild gl/src/init.c with -DNO_INIT_CONSTRUCTOR and call initialize_gl4es() before using it (so after the context creation). You will probably need to declare that function with extern void initialize_gl4es() or extern "C" void initialize_gl4es() if you are in a cpp file.

kas1e commented 6 years ago

By name conflict i mean and visual (when one read code, he will think its real functions, as one of our devs today), but also it can be conflict, when one will use useinline directive, which we have in our SDK, and which allow us to use functions without needs to write interface name. Ie pure aglCreateContext can work as it from ogles2 if anyone, anywhere will set use_inline__.

Sure we can call it not gl4es_aglCreatingContext, but gl4eshelper_aglCreateContext or anything else just not the same names as originals as it can lead to problems later.

As for memory corruption: at moment i come to that as Daniel spend some days to try to catch issue with q3, and say that it some undefined behaviour which trash memiory , and even if we think that our workaround with that normalisation problem deal with, it just shift issue somewhere else.

Then i found that issue with Cadog and printfs before context creating, which point on memory trashing as well.

Also lettersfall game have trashed parts even after our normalisation workaround.

We of course cant rule out ogles2 itself anyway, but i fear it can be just something in terms of how we add amiga parts to gl4es. Name clashing, nonworking aglCreateContexTags helper: at least few issues in that area already, and maybe somewhere some pointer looses, or race condition, or dunno ..

Have any idea how it even possible to debug that all to find out from where problems come ? Example with cadog imho good test case

kas1e commented 6 years ago

Aha thanks, will try your idea. At least lettersfall always have trashing, so can check this out

kas1e commented 6 years ago

Probably you mean that i should call initialize_gl4es(); not after, but before context creation ? Because context want IOGLES2-> , which is called from loader.c at end from , from load_lib(), which i called from initialize_gl4es().

So if i put it after, then it crashes because IOGLES2-> not initialized, but if i put it before, then it works.

Through, it make no differences. I.e. if i have it like this:

    initialize_gl4es();

    dprintf("Initializing GL4ES->OGLES2..\n");

hidden->IGL=IOGLES2->aglCreateContextTags(0,
            OGLES2_CCT_WINDOW,(ULONG)hidden->win,
            OGLES2_CCT_DEPTH,16,
            OGLES2_CCT_STENCIL,8,
            OGLES2_CCT_VSYNC,0,
            OGLES2_CCT_SINGLE_GET_ERROR_MODE,1,
            OGLES2_CCT_RESIZE_VIEWPORT, TRUE,
        TAG_DONE);

Then Cadog still have problem. Once i comment out prinfs, Cadog is fine.

Through now, i can't reproduce it by pritnf("aaaa"); but that probably just because memory change the layout a bit when i build init.c with -DNO_INIT_CONSTRUCTOR.

ptitSeb commented 6 years ago

What I mean is, if you still suspect the memory corruption comes from gl4es, you should create the context without gl4es at all. So initialize OGLES2 without gl4es, create the context, and then initialize gl4es. That way, all the first part up to context creation / context current can be don without gl4es involved. So if you still see difference with and without printf, then gl4es is not the cause.

But again, are you sure dprintf doesn't need a file handle before the string?

kas1e commented 6 years ago

Ah ok, got what you mean, will try it now.

As for dprintf, its just that for us:

define dprintf(format, args...) IExec->DebugPrintF("[%s] " format, __PRETTY_FUNCTION__ , ## args)

So to do printfs to the serial line (so we can catch logs with PC/putty even in worse situations). That one prove to work, and as even with pure "printfs" i have issues, then it probably not related.

Will try now to initialize everything myself from SDL now.

kas1e commented 6 years ago

If i tried to open library / inerface from SDL, then on linking i have errors about multiply defines of IOGLES, as amigaos.c have one. Then if i will comment out opening of library/closing in amigaos.c , then i have undef errors to those functions from loader.c and glx.c.

Can i just make them empty (i mean os4openlibs and os4closelibs) and just add "extern struct OGLES2IFace *IOGLES2 = NULL;" ?

kas1e commented 6 years ago

Btw , what exactly gl4es need inside, so to be able to interract with the our ogles2 ? I mean maybe it worth to rule out everything about it outside (i.e. whole amigaos.c) , but then we need then somehow to send from SDL some poiners to gl4es ?

ptitSeb commented 6 years ago

Yes, comment all inside os4OpenLib and os4CloseLib and declarig both struct external (remove the =NULL in that case) should enough. As long as you initialize OGLES before calling gl4es_init it should work.

The remaining stuff in amigaos.c is required for gl4es to function.

kas1e commented 6 years ago

If i do like this, then i have that output:

LIBGL: Initialising gl4es LIBGL: v1.0.5 build on Mar 17 2018 21:49:39 LIBGL: Using GLES 2.0 backend LIBGL: Hardware test diabled, nothing activated .... LIBGL: warning, gles_glGetIntegerv is NULL LIBGL: Targeting OpenGL 2.0 LIBGL: Current folder is:NO NAME:cadog-gl LIBGL: warning, gles_glViewport is NULL

And then crash

kas1e commented 6 years ago

Window is open of course, context creates, etc.

ptitSeb commented 6 years ago

Mmm, in os4OpenLib add *lib = LOGLES2;, that should fix the issue.

kas1e commented 6 years ago

Yeah, that way works.

But issue with cadog when we have prinfs befor context creation still here :)

Question is: is it possible to have problems inside of gl4es, after we call initialize_gl4es(); ? I mean memory trashing ones ? Can it be that something was added after you do valgrind on it which may cause such efffects ? I of course pretty sure it is not gl4es , but just to rule out step by step all possible scenarios.

kas1e commented 6 years ago

I do not know if there is anything now can be amiga relatd in the gl4es code which can cause issues .. We have left there only os4GetProcAddress with list left in amigaos.c , and i see ifdefs of including amigaos related code only in gl.c (some little ifs , and functions for swapbuffers) , calling of os4opnelibs from loader.c , close of os4libs in glx.c and just lookup in the lookup.c..

Not much which can cause issues ..

ptitSeb commented 6 years ago

It's not gl4es. I don't say that because I run valgrind reguraly with gl4es (to check other stuffs), but because you have just remove everything gl4es related befaore the context creation, and you still have the issue. Why would it be gl4es, it's not use before the memory is corrupted.

Run valrgind on the Amiga if you can, you'll see gl4es as nothing to do with your memory corruption issue.

kas1e commented 6 years ago

@ptitSeb I trying for now to rebuild libgl4es.a with -O0 -fno-strict-aliasing , to see if it our compiler can generate somewhere something wrong, and, while all builds fine, on linkin stage i have from list.o , that:

libgl4es.a(list.o): In function rlVertex4f': list.c:(.text+0xe838): undefined reference torlVertexCommon' libgl4es.a(list.o): In function rlVertex3fv': list.c:(.text+0xe938): undefined reference torlVertexCommon' libgl4es.a(list.o): In function rlVertex4fv': list.c:(.text+0xea14): undefined reference torlVertexCommon' libgl4es.a(list.o): In function rlNormal3f': list.c:(.text+0xf780): undefined reference torlNormalCommon' libgl4es.a(list.o): In function rlNormal3fv': list.c:(.text+0xf800): undefined reference torlNormalCommon' libgl4es.a(list.o): In function rlColor4f': list.c:(.text+0xf860): undefined reference torlColorCommon' libgl4es.a(list.o): In function rlColor4fv': list.c:(.text+0xf8f4): undefined reference torlColorCommon' collect2: ld returned 1 exit status makefile:60: recipe for target 'lf3' failed

If i compile list.c even without -fno-strict-aliasing error is the same. But once i replcae -O0 back on -O2, then, there is no such linking errors.

Sounds strange . Also those names of function about normalisation and vertexcommon, sounds like something about our bug.

kas1e commented 6 years ago

With -O1 works ok too. Just with -O0 produce those errors on linking stage.

ptitSeb commented 6 years ago

It's just the inline in front of the function definition that your linker doesn't like. Remove it and it will link fine.

And no, it's definitely not the source of your issue. Just a glitch of you GCC version, with -O0 it remove inline function but somehow still want the inline version for linking (I could also avoid the use of inline and let the compiler decide).

kas1e commented 6 years ago

Damn, you are right (as always). It starts all to be interesting when i start to hate it :))

kas1e commented 6 years ago

@ptitSeb Sorry for bother again with, but we tried last week all possible scenarios , tests and ports, etc. All the time we come to some place, that some strange memory trash happens, and none of us know from where and how to detect it :(

Yesterday i buid NeverBall. It just crashes on running on glDrawElements. Daniel do checked the index array at the time of the crash. It contains about let's say 80% garbage (tons 0,0,X "triangles", lots of 0xFFFF indices; well it looks like semi-randomly trashed memory), the last maybe 20% look like somewhat valid indices inside the expected range.

And there are two glDrawElement calls before the one that crashs, those looks absolutely sane (reasonable indices etc.) and they work flawlessly.

Until the point of the crash the whole lib seems to work correctly, no sign of any lib or other coruption (and quite a lot happens under the hood until then), but then it's fed with this invalid index array and says good bye.

Of course I cannot say who's the one who really corupts it in the first place (because its coruption can also be a side-effect of something else). But I can say that it is already corupt before ogles2 does any work inside glDrawElements, it is being sent in a corupted state to ogles2.

I also tried all kind of different compiler's flags, all kind of different scenerios with SDL, etc. And all the time something weird happens with memory, and all the time its around glDrawElements.

Now, what we think about, is that possible, that we still have in gl4es some endian issues. I am not sure anymore of course , just as one more idea.. One of developers bring that kind of info on me:

Endianess problems sometimes appear in unexpected places. in AmigaOS/Exec/MakeLibrary() for example there's the table of functions which can contain function pointers (4 byte entries) or, if the first WORD in the table == 0xFFFF, then instead it contains offsets (2 byte entries). The check "if (WORD)funcInit==-1)" does not work on little endian (AROS for example), which was discovered more or less by "luck" when by pure coincidence the first function pointer happened to end exactly at address 0x????FFFF. And so the (WORD ) saw 0xFFFF there and assumed it was offsets instead of absolute addresses (a function on x86 may start at an odd address).

Or think about things like reading lower 16 bit from a 32 bit variable like this:

ULONG var = 0x12345678; ULONG ptr1 = &var; UWORD ptr2 = (UWORD *)&var;

UWORD w16 = (UWORD)ptr1; / works: -> 0x5678 / UWORD w16 = ptr2; / works too on little endian: -> 0x5678 /

Is there any place in gl4es, where we can (at least assume) that something like that can happens ?

I also may try to install ppc-linux on my hardware , and to try gl4es on it..

ptitSeb commented 6 years ago

I don't see any places in gl4es where this kind of things can happens. Most (if not all) the conversions gl4es does are "clean", and made by macro trickery.

The indices that are 0xffff are quite easy to detect. If src/gl/fpe.c you can try to printf and alert if some indices are trash.. Line 369, before gles_glDrawElements(mode, count, type, indices); add:

    if(type==GL_UNSIGNED_SHORT) {
        GLushort *ind = (GLushort*)indices;
        for (int i=0; i<count; i++)
            if(ind[i]==0xffff)
                printf("WARNING: Indices[%d] is 0xffff\n", i);
    }

or dprintfif it's better for you (evend the %dand ,i are not mendatory). That can help find the cause (if that happens, do a run with debug enabled in fpe.c, in case there is something obvious).

kas1e commented 6 years ago

I run with debug fpe.c before as well, there is output: http://kas1e.mikendezign.com/aos4/gl4es/games/neverball/neverball_fpe_debug.txt

That from the running, till crash.

ptitSeb commented 6 years ago

Not much to say regarding the trace. It's program number 275 (starting from 256), 20th program? Can that be an issue (I don't think so)? You should activate also debug in shaderconv.c in case that program as something specific. Also, the glDrawElements is the "biggest" in the trace, with 5550 vertex (other are "only" 3072 vertex).

So yeah, not much to say.

kas1e commented 6 years ago

Added that printf with 0xffff : nothing found. At least before crash i see no prinfs at all ..

kas1e commented 6 years ago

What else we have after gles_glDrawElements(mode, count, type, indices); call, and before actual stuff go to amiglDrawElements ? Maybes it worth to add that prinfs also in amigaos.c , before actuall call to ogles2 ?

ptitSeb commented 6 years ago

That gles_glDrawElements(mode, count, type, indices); is the actual call to the GLES2 driver. So it's directly AmiglDrawElements at this stage, that call OGLES2->glDrawElements All direct call.

You can add the 0xffff check in AmiglDrawElement if you want, it's in src/agl/amigaos.c, line 213...

kas1e commented 6 years ago

Yeah, already, with no luck .. I.e. nothing printf. Wtf .. First time see that kind of random strange issue. It didn't out from gl4es in bad form, and already recieved in ogles2 library in bad form. Wtf :))

ptitSeb commented 6 years ago

It would be interresting to have the same kind of test inside OGLES2, and also see what are the address of those 0xffff value (to compare with the beginning of the indices array).

kas1e commented 6 years ago

As daniel say those 0xffff was just random, which may or may not be on my setup, and even on his it not everytime like this..

Can we add prinfs, which will spits out the count / pointer etc. at our glDrawElements like we did above and then simply also print out the, let's say, first 100 ushort indices, comma seperated ?

ptitSeb commented 6 years ago

Then check values that are suspicusly high, like

    if(type==GL_UNSIGNED_SHORT) {
        GLushort *ind = (GLushort*)indices;
        for (int i=0; i<count; i++)
            if(ind[i]>0x2000)
                printf("WARNING: Indices[%d] is 0x%x\n", i, ind[i]);
    }
ptitSeb commented 6 years ago

Now if you prefer the comma sepated list (with max value as added bonus):

    if(type==GL_UNSIGNED_SHORT) {
        GLushort *ind = (GLushort*)indices;
        GLushort m = 0;
        printf("Indices=");
        for (int i=0; i<count; i++) {
            if(ind[i]>m) m = ind[i];
            if(i<100)
                  printf("%d%c", ind[i], i?',':' ');
        }
        printf("\nThere are %d indices, max value is %d\n", count, m);
    }
kas1e commented 6 years ago

Aha thanks, done. That what i have when run neverball before crash happens:

http://kas1e.mikendezign.com/aos4/gl4es/games/neverball/neverball_print_indices.txt

That what i have when run neverputt before crash happens:

http://kas1e.mikendezign.com/aos4/gl4es/games/neverball/neverputt_print_indices.txt

And that what i have, when i play with stack settings , and neverputt are runs when i set low stack (lower than 65k):

http://kas1e.mikendezign.com/aos4/gl4es/games/neverball/neverputt_print_indices_when_change_stack_running_ok.txt

On amigaos4 we have ability to control size of stack which will be used when we run programms, by default we have 65.535, but we can raise it to any size. What is strange is that when i LOWER stack size , then neverputt at least runs (see 3st output). But when i make stack bigger, it crashes right away on start a neverball.

Interesting..

ptitSeb commented 6 years ago

Mmm, those indices are indeed really wrong. I need to think a bit and will ask for more logs...