ptitSeb / gl4es

GL4ES is a OpenGL 2.1/1.5 to GL ES 2.0/1.1 translation library, with support for Pandora, ODroid, OrangePI, CHIP, Raspberry PI, Android, Emscripten and AmigaOS4.
http://ptitseb.github.io/gl4es/
MIT License
694 stars 159 forks source link

Has anyone tried running with the latest EDuke32 code? #344

Open emileb opened 2 years ago

emileb commented 2 years ago

I am trying to use this on the latest Eduke32 code found here:

https://voidpoint.io/terminx/eduke32

The polymost renderer has been updated to OpenGL 2.0 and now uses shaders for the fragments, but still all the old fixed functions for a lot of the geometry etc. There was some GLES support (#define EDUKE32_GLES) when it running OpenGL 1.1, but this support is broken now and not usable.

Using a very old version of GL4ES (1.1.0 from 2018) it runs but there are no textures, looks like some simple colour shading is working a bit so you can see the geometry is being rendered.

Using the latest version of GL4ES I get a crash in glDrawArrays inside GL4ES, not been able to find the cause yet.

I am tempted to actually convert the renderer to proper GLES2.0 but the code is very hard to work with, one enormous 10K line file! https://voidpoint.io/terminx/eduke32/-/blob/master/source/build/src/polymost.cpp

emileb commented 2 years ago

I reverted to gl4es v1.1.4, this is the log:

I get a crash in GPU driver when doing the first glDrawArrays, I suspect the renderer is not setting the vertex data properly or doing something invalid (same code works on Windows however). I will try to debug this.

I/LIBGL: Initialising gl4es I/LIBGL: v1.1.4 built on Oct 6 2021 20:44:49 I/Choreographer: Skipped 464 frames! The application may be doing too much work on its main thread. I/LIBGL: Using GLES 2.0 backend I/LIBGL: loaded: libGLESv2.so I/LIBGL: Hardware test on current Context... I/LIBGL: Hardware Full NPOT detected and used I/LIBGL: FBO are in core, and so used I/LIBGL: PointSprite are in core, and so used I/LIBGL: CubeMap are in core, and so used I/LIBGL: BlendColor is in core, and so used I/LIBGL: Blend Substract is in core, and so used I/LIBGL: Blend Function and Equation Separation is in core, and so used I/LIBGL: Texture Mirrored Repeat is in core, and so used I/LIBGL: Extension GL_OES_element_index_uint detected and used I/LIBGL: Extension GL_OES_packed_depth_stencil detected and used I/LIBGL: Extension GL_OES_depth24 detected and used I/LIBGL: Extension GL_OES_rgb8_rgba8 detected and used I/LIBGL: Extension GL_EXT_texture_format_BGRA8888 detected and used I/LIBGL: Extension GL_OES_depth_texture detected and used I/LIBGL: Extension GL_OES_texture_stencil8 detected and used I/LIBGL: Extension GL_OES_texture_float detected and used I/LIBGL: Extension GL_OES_texture_half_float detected and used I/LIBGL: Extension GL_EXT_color_buffer_float detected and used I/LIBGL: Extension GL_EXT_color_buffer_half_float detected and used I/LIBGL: high precision float in fragment shader available and used I/LIBGL: Max vertex attrib: 32 I/LIBGL: Extension GL_OES_standard_derivatives detected and used I/LIBGL: Max texture size: 16384 I/LIBGL: Max Varying Vector: 31 I/LIBGL: Texture Units: 16/16 (hardware: 16), Max lights: 8, Max planes: 6 I/LIBGL: Extension GL_EXT_texture_filter_anisotropic detected and used I/LIBGL: Max Anisotropic filtering: 16 I/LIBGL: Max Color Attachments: 1 / Draw buffers: 1 I/LIBGL: Hardware vendor is Qualcomm I/Adreno: ERROR: Invalid #version I/LIBGL: GLSL 300 es supported I/LIBGL: GLSL 310 es supported and used I/LIBGL: Targeting OpenGL 2.1 I/LIBGL: NPOT texture handled in hardware I/LIBGL: Not trying to batch small subsequent glDrawXXXX I/LIBGL: Use of VBO disabled I/LIBGL: Log to the console Error compiling shaders I/LIBGL: Current folder is:/storage/emulated/0/OpenTouch/Raze/DUKE

emileb commented 2 years ago

Sorry, as usual I was doing something wrong.. I had VBOs disabled with LIBGL_USEVBO. It now starts up but crashes on new game, to be investigated but now gets much further. I will update this with any progress!

ptitSeb commented 2 years ago

Check also there: https://pyra-handheld.com/boards/threads/the-pandora-port-request-thread.40958/post-1457145 were I hacked an older version of the engine to make it run on the Pandora.

emileb commented 2 years ago

Check also there: https://pyra-handheld.com/boards/threads/the-pandora-port-request-thread.40958/post-1457145 were I hacked an older version of the engine to make it run on the Pandora.

Ahhh amazing! Thanks so much, I will take a look at those patches. Great work!

emileb commented 2 years ago

OK it was INCREDIBLY slow on my device but I increased the FPS from about 10 to locked 60 on my device.

For some reason the non-persistent renderer is using a single buffer and then using glBufferSubData to update parts of it and rendering that part, this was causing a MASSIVE amount of GPU time updating this buffer as shown in this profile:

image

So I replaced the glBufferSubData with a glBufferData and copied data needed for the draw so now it looks like this:

   if (!persistentStreamBuffer)
        {
           // glBufferSubData(GL_ARRAY_BUFFER, drawpolyVertsOffset*sizeof(float)*5, npoints*sizeof(float)*5, drawpolyVerts);
            glBufferData(GL_ARRAY_BUFFER, npoints*sizeof(float)*5, drawpolyVerts, GL_STREAM_DRAW);
        }

        glDrawArrays(GL_TRIANGLE_FAN, 0, npoints);

Now it runs fine and the profile looks like this which is much more sensible:

image

When I update my github with the updates I'll show you if you're interested.

EDIT: I am just testing Duke first level, not sure if there are any side-effects of this!

ptitSeb commented 2 years ago

That's really strange that a glBufferSubData(...) to update a buffer is much slower than a glBufferData(...) that will recreate the buffer?!

emileb commented 2 years ago

That's really strange that a glBufferSubData(...) to update a buffer is much slower than a glBufferData(...) that will recreate the buffer?!

Yeah it is strange, happens on both my Snapdragon devices, maybe something to do with the way that specific GPU works so will try another.

Unfortunately Ion Fury it is still way too slow in OpenGL mode. I'm sure the engine is not working optimally, I think it might work better if it uploaded all the geometry to the buffer in one call at the start, then do all the draw calls afterwards (would need to iterate all the wall/sprite arrays twice). The profiler shows still shows a huge amount of time playing with buffer data.

emileb commented 2 years ago

Small optimisation for SW mode: https://github.com/emileb/eduke32_mobile/commit/f762b48a1dfa040dd683f1787147c8db63f314b7

GL_RED type was causing videoNextPage to take 14% of frame time due to swizzle, now takes 4%.

image

emileb commented 2 years ago

Just for interest: The starting scene - image

There are about 3400 individual draw calls, all of them are under 5 vertex's, 95% are 4 vertex, and some 3 vertex calls.

Up to about draw call 2100, lots of the drawcalls have NO state change in between them (Thanks to GL4ES checking if uniforms have actually changed). These glDrawArrays could be combined together when no state change.

image

The second half of the drawcalls seems to have loads of pointless state change in between them: image

See it is continually changing to program 4 then changing a load of uniforms, then changing back to program 14 and sometimes changing some more uniform. I need to debug to find where this is coming from.

ptitSeb commented 2 years ago

Yes, the scene from the screenshot, this point of view is always slow.

Strange that gl4es doesn't concatenate all those small glDrawArrays, as there is some mecanism to do exactly that.

emileb commented 2 years ago

Yes, the scene from the screenshot, this point of view is always slow.

Strange that gl4es doesn't concatenate all those small glDrawArrays, as there is some mecanism to do exactly that.

Indeed that is the 'LIBGL_BATCH' option right? I did enable that option but they were still not being combined. I will investigate why after I have fixed the unnecessary state changes in the second half of the render.

emileb commented 2 years ago

See it is continually changing to program 4 then changing a load of uniforms, then changing back to program 14 and sometimes changing some more uniform. I need to debug to find where this is coming from.

Ohh OK I see what's going on, program 14 is created by GL4ES to make alpha testing work then the uniforms are synchronised. I'll see what happens if I implement alpha test in poylmost shader to avoid the overhead

StrikerMan780 commented 2 years ago

Heh, seems I'm not the only one. I'm currently working on trying to get EDuke32 working well on the Raspberry Pi 4, and well, I'm not getting a whole lot of luck. Earlier builds ran, but very poorly, either native OpenGL or GL4ES in Polymost.

In newer distros, like Ubuntu MATE 21.10, it won't load at all in Polymost without a segfault (due to GLXBadFBConfig). So, for now, I'm using Software. I tried the exact same thing mentioned above, using the alpha channel, but apparently GL_ALPHA isn't valid beyond GL 2?

If anyone has any updates or advice, I'm all ears.

emileb commented 2 years ago

@StrikerMan780 So this is the latest modified code I use for Zeta Touch for Android: https://github.com/emileb/eduke32_mobile/tree/master_mobile (http://opentouchgaming.com/zeta-touch/). There were a number of changes which greatly improved performance on Android devices (see new USE_GLES2 defines), they may also help for the Raspberry Pi. With GL4ES is should be operating strictly using GLES 2.0 so I would disable any full OpenGL translation layers the Pi might be doing and create a GLES 2 context. You may need to do some fixes to get this building on linux as I have not tried this, there could be Android specific stuff trying to build, I have tried #define out and Android stuff but it's not tested.