Open tareksander opened 1 year ago
I think it should be double buffering to not wait until X server finishes working with current buffer. But I am not sure how to implement synchronisation that will say that buffer is processed and you can really swap it. Btw Android uses triple buffering.
X11 also has a sync protocol, and the present protocol has support for it, but it's just a bit more work to actually implement that.
Currently everything seems to be better, but still has performance close to llvmpipe.
And glmark2 working on llvmpipe + virpipe working on the same device.
Interesting thing, without glReadPixels it has perfomance comparable to virpipe (but higher).
~/gfx/build $ __EGL_VENDOR_LIBRARY_FILENAMES=/data/data/com.termux/files/usr/share/glvnd/egl_vendor.d/10_android_wrapper.json glmark2-es2 --fullscreen | grep -v -e "swap interval" -e "swap_control"
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: Imagination Technologies
GL_RENDERER: PowerVR Rogue GE8320
GL_VERSION: OpenGL ES 3.2 build 1.13@5776728
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 496x1100 fullscreen
=======================================================
Using memfd
draw resize
[build] use-vbo=false: FPS: 58 FrameTime: 17.536 ms
[build] use-vbo=true: FPS: 58 FrameTime: 17.318 ms
[texture] texture-filter=nearest: FPS: 57 FrameTime: 17.597 ms
[texture] texture-filter=linear: FPS: 55 FrameTime: 18.291 ms
[texture] texture-filter=mipmap: FPS: 57 FrameTime: 17.749 ms
[shading] shading=gouraud: FPS: 57 FrameTime: 17.721 ms
[shading] shading=blinn-phong-inf: FPS: 57 FrameTime: 17.832 ms
[shading] shading=phong: FPS: 56 FrameTime: 17.883 ms
[shading] shading=cel: FPS: 57 FrameTime: 17.569 ms
[bump] bump-render=high-poly: FPS: 56 FrameTime: 18.097 ms
[bump] bump-render=normals: FPS: 58 FrameTime: 17.439 ms
[bump] bump-render=height: FPS: 56 FrameTime: 18.084 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 56 FrameTime: 18.055 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 57 FrameTime: 17.850 ms
[pulsar] light=false:quads=5:texture=false: FPS: 57 FrameTime: 17.590 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 61 FrameTime: 16.641 ms
[desktop] effect=shadow:windows=4: FPS: 60 FrameTime: 16.948 ms
Error: Requested MapBuffer VBO update method but GL_OES_mapbuffer is not supported!
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: Unsupported
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 59 FrameTime: 17.152 ms
Error: Requested MapBuffer VBO update method but GL_OES_mapbuffer is not supported!
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: Unsupported
[ideas] speed=duration: FPS: 59 FrameTime: 17.071 ms
[jellyfish] <default>: FPS: 59 FrameTime: 17.154 ms
[terrain] <default>: FPS: 39 FrameTime: 26.277 ms
Error: We do not have the depth texture extension!!!
[shadow] <default>: Unsupported
Error: We do not have the depth texture extension!!!
[refract] <default>: Unsupported
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 57 FrameTime: 17.557 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 57 FrameTime: 17.850 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 56 FrameTime: 18.080 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 56 FrameTime: 18.096 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 57 FrameTime: 17.676 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 57 FrameTime: 17.751 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 57 FrameTime: 17.759 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 56 FrameTime: 18.008 ms
=======================================================
glmark2 Score: 55
=======================================================
But this is contrary to logic, perfomance with glReadPixels should be close to virpipe perfomance.
Maybe it is somehow related to waiting for rendering. After disabling both glReadPixels and sending xcb_present_pixmap_checked (replaced with real_eglSwapBuffers(nativeDisplay, Surface::getSurface((Surface*)surface));
) I've got this log:
And for comparing I've tested my device with Android's native glmark2 and got this log:
So as far as I see here busy waiting is a BIG problem, so we should use double buffering here and switch buffers on XCB_PRESENT_COMPLETE_NOTIFY
. Blocking connection until we get XCB_PRESENT_COMPLETE_NOTIFY is a VERY bad idea. But I agree that it fits to a test case.
Thank you in advance.
Also I think we should call real_eglSwapBuffers(nativeDisplay, Surface::getSurface((Surface*)surface));
before glReadPixels
. I've tried glFlush
but it looks like it does not really finish all drawing operations (I am getting segfault in /vendor/lib64/egl/libGLESv2_mtk.so (glDrawArrays+2880)
). Even if it will not really do something it will be low-cost and harmless.
Also (for fallback) we should not do glReadPixels
until we get XCB_PRESENT_COMPLETE_NOTIFY
. This way we will keep GL and rest of application running even if frame still was not rendered. But application will be able to wait for synchronization event (vblank?) using EGLSync mechanisms (which will be emulated/wrapped too).
I implemented double buffering now, so we only have to wait for X if the last present took longer than rendering the next frame. Also I'm waiting for pixmap idle events instead now, so the present operation doesn't have to be complete. With that I get ~800fps in es2gears in the emulator. I'll se if I can modify it to only present in the last present finished, so we don't bother X with too many images, maybe that improves it further.
Yep, not processing the buffer on the CPU when it's not needed has really improved things, now ~85000 fps in the emulator.
Next I'll try to create a shared pixmap from an DMABUF fd. Can you try glmark again? In the emulator it gives me this error:
$ glmark2-es2
Error: Failed to find suitable EGL config
Error: Error: Couldn't get GL visual config!
Error: main: Could not initialize canvas
es2gears
works, so the env variables are fine. I'll debug that later.
Also I think we should call
real_eglSwapBuffers(nativeDisplay, Surface::getSurface((Surface*)surface));
beforeglReadPixels
. I've triedglFlush
but it looks like it does not really finish all drawing operations (I am getting segfault in/vendor/lib64/egl/libGLESv2_mtk.so (glDrawArrays+2880)
). Even if it will not really do something it will be low-cost and harmless.
glReadPixels
should work as-is, quoting chapter 2.1 of the GLES2 spec: "Commands are always processed in the order in which they are received, although there may be an indeterminate delay before the effects of a command are realized. This means, for example, that one primitive must be drawn completely before any subsequent one can affect the framebuffer. It also means that queries and pixel read operations return state consistent with complete execution of all previously invoked GL commands. In general, the effects of a GL command on either GL modes or the framebuffer must be complete before any subsequent command can have any such effects."
Ok. Now glmark2-es2 reports highest FPS I've seen but the rendering is not smooth.
Rendering was much more smooth with the old method, but it did not report high FPS.
Also about rendewrer name. Mesa's virpipe uses virgl ($originalName)
for this. Can we use here something like termux-gfx-wrapper ($originalName)
?
Rendering was much more smooth with the old method, but it did not report high FPS.
Ok, I think I'll keep the new method as an option. Is there some way to predict the time window where a new frame has to be submitted for X11? Maybe that could be used to select an appropriate frame. The problem is that when we aren't bombarding X with frames, the time after the last frame completed and the next one is processed on the CPU could be too high with all the format conversion. With a bit of prediction you could select the approximate last frame that could still make it in time to X and render that.
Or maybe a proper triple buffering implementation would be better, to fully decouple the display timing from the rendering.
The other question is: do you even need 1000s of frames when X can only display 60 of them a second? It's reasonable that eglSwapBuffers can block, and that would also save CPU and GPU resources for everything else. So should VSync be an option, e.g. to old PresentNotify system? If an application needs rendering decoupled from the window system timings, it can use PBuffers or GLES FBOs.
Also about rendewrer name. Mesa's virpipe uses
virgl ($originalName)
for this. Can we use here something liketermux-gfx-wrapper ($originalName)
?
Fixed that.
Next I'll work on HardwareBuffer rendering, which can eliminate the copy to the pixmap. And maybe also the format change, depending on available HardwareBuffer formats. But doing the format change in a shader instead should also be possible, and may be faster, depending on whether the memory bandwidth is the bottleneck or not.
Maybe that can smooth out the frames.
Ok, I think I'll keep the new method as an option. Is there some way to predict the time window where a new frame has to be submitted for X11?
Maybe timers. Currently X server draws image (or tries to) every 17 milliseconds (when possible).
The problem is that when we aren't bombarding X with frames, the time after the last frame completed and the next one is processed on the CPU could be too high with all the format conversion.
I thought about triple buffering. I thought about the following roles:
So when eglSwapBuffers receives PRESENT_COMPLETE_NOTIFY
for current processing buffer it glReadPixels
es current front buffer, converts it's format and sends present_pixmap request. I mean roles of buffers are shifted and now back buffer becomes front buffer, precessing buffer received response so it is a back buffer.
The other question is: do you even need 1000s of frames when X can only display 60 of them a second?
I think that is a target of SwapInterval feature. Currently we have SwapInterval = 0 for tests.
Fixed that.
I think you did not.
GL_VENDOR: Imagination Technologies
GL_RENDERER: PowerVR Rogue GE8320
GL_VERSION: OpenGL ES 3.2 build 1.13@5776728
Ok, I think I'll keep the new method as an option. Is there some way to predict the time window where a new frame has to be submitted for X11? Maybe that could be used to select an appropriate frame. The problem is that when we aren't bombarding X with frames, the time after the last frame completed and the next one is processed on the CPU could be too high with all the format conversion. With a bit of prediction you could select the approximate last frame that could still make it in time to X and render that.
Or maybe a proper triple buffering implementation would be better, to fully decouple the display timing from the rendering.
The other question is: do you even need 1000s of frames when X can only display 60 of them a second? It's reasonable that eglSwapBuffers can block, and that would also save CPU and GPU resources for everything else. So should VSync be an option, e.g. to old PresentNotify system? If an application needs rendering decoupled from the window system timings, it can use PBuffers or GLES FBOs.
Maybe we can ask Mesa people how it should work? I am pretty sure they know what's better.
Fixed that.
I think you did not.
GL_VENDOR: Imagination Technologies GL_RENDERER: PowerVR Rogue GE8320 GL_VERSION: OpenGL ES 3.2 build 1.13@5776728
Oh, you meant for GLES, not for EGL.
Fixed it for real now.
According to EGL spec a swap interval of 1 is the default, so normal vsync. I'll make that the default then when the testing is complete and properly implement eglSwapInterval
for values 0 and 1.
I also made the env variable TERMUX_EGL_X11_MODE
to override it for now. BLOCK
forces vsync (old version, wait for present notify), IDLE
just waits for pixmap idle notify (the new version).
Approximating the time to the next frame could work, finally an opportunity to apply some sliding average function I learned in university lol.
Wait. I thought Imagination Technologies
is some kind of joke or easter egg for lols that you put into wrapper. But it looks like it is a real company and it is unmodified name loaded directly from vendor libraries. I like it :) .
For some reason eglinfo has segfault.
Can you test what the last commit is that works for you? In the emulator it still works. Maybe it's the HardwareBuffer fd extraction test at the start? If it can crash, it would be better to just perform it at install and cache the result (and provide a way to re-run the test, in case a system update changes the behaviour).
Hmm...
It fails after this line.
`Interesting thing. Even a few things. It reports
EGL vendor string: Android
EGL version string: 1.4 Android META-EGL
But I see you are returning modified values from wrapper.
Another thing.
~/demos $ strace ./eglinfo 2>&1 | grep openat | grep GL
openat(AT_FDCWD, "/data/data/com.termux/files/usr/lib/libEGL.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/data/data/com.termux/files/usr/lib/libGLdispatch.so.0", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/system/lib64/libEGL.so", O_RDONLY|O_CLOEXEC) = 32
openat(AT_FDCWD, "/system/lib64/libGLESv2.so", O_RDONLY|O_CLOEXEC) = 33
openat(AT_FDCWD, "/system/lib64/libGLESv1_CM.so", O_RDONLY|O_CLOEXEC) = 93
openat(AT_FDCWD, "/system/lib64/libGLESv3.so", O_RDONLY|O_CLOEXEC) = 94
openat(AT_FDCWD, "/system/lib64/libEGL.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/system/lib64/libGLESv2.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/vendor/lib64/egl/libGLES_mali.so", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/system/lib64/libEGL.so", O_RDONLY|O_CLOEXEC) = 6
openat(AT_FDCWD, "/system/lib64/libGLESv1_CM.so", O_RDONLY|O_CLOEXEC) = 6
openat(AT_FDCWD, "/system/lib64/libGLESv2.so", O_RDONLY|O_CLOEXEC) = 6
openat(AT_FDCWD, "/data/data/com.termux/files/usr/lib/libEGL_mesa.so.0", O_RDONLY|O_CLOEXEC) = 13
So it somehow links to termux's libEGL (which is glvnd), then it links to vendor libraries, but in the end it links to libEGL_mesa.so.0
and I do not really understand how it is possible.
Also it dlsyms to glGetStringi
but you do not export that. I am checked if it has pointer to function and I am pretty sure it is real, maybe it is pointer of glvnd itself.
Can you test what the last commit is that works for you
As far as I can understand eglinfo never worked with gfx-wrapper. I tried to build every single version I found and eglinfo segfaults with all of them.
The library loading is expected. libglvnd loads all vendor EGL libraries upfront. The wrapper then loads the system EGL, which in turn loads the system and vendor GLES libs. After that Mesa gets loaded. Libglvnd provides all EGL and GLES core functions, which it then dispatches to the current vendor.
Could you build eglinfo with debug info and use gdb to find out where in eglinfo it's crashing?
Could you build eglinfo with debug info and use gdb to find out where in eglinfo it's crashing?
https://github.com/tareksander/termux-gfx-wrapper/issues/3#issuecomment-1566767926
termux-packages repo does not contain meson (or I simply did not find it) so I've built eglinfo like this.
~/build $ clang src/egl/opengl/eglinfo.c src/glad/src/gl.c src/util/glinfo_common.c src/glad/src/egl.c -Isrc/util -Isrc/glad/include -o eglinfo -ggdb
glad_glGetStringi returns some invalid pointer.
I think it's because EGL returns an ES3 context because it's backwards compatible with ES2, glad recognizes that and tries to use glGetStringi, which is an ES3 function. I set the reported version to 2.0 now, you can try that fix. For me eglinfo just never tried to display gles info.
You are awesome.
Android platform:
EGL API version: 1.4
EGL vendor string: Android
EGL version string: 1.4 Android META-EGL
EGL client APIs: OpenGL_ES
EGL extensions string:
EGL_ANDROID_get_native_client_buffer, EGL_ANDROID_image_native_buffer,
EGL_KHR_image_base, EGL_KHR_platform_android
OpenGL ES profile vendor: termux-gfx-wrapper (Imagination Technologies)
OpenGL ES profile renderer: termux-gfx-wrapper (PowerVR Rogue GE8320)
OpenGL ES profile version: OpenGL ES 2.0
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20 build 1.13@5776728
OpenGL ES profile extensions:
Oh, and the EGL strings aren't wrapped, because eglinfo is specifically requesting the Android platform (saw that in your output just now), not X11. The Android platform is designed to be a passthrough as much as possible, though I guess wrapping these string wouldn't hurt.
@twaik I finished HardwareBuffer surfaces now but the rendered content won't show up in X11, even when using glFlush. Using memset to fill the DMABUF fd with 0xff displays a white image correctly though. Is it just not working in the emulator, or do you get the same on hardware?
It works but glmark score is very low.
With idle mode it is lower too.
I'll check what can be wrong here.
Ok, I do not know what can be wrong here.
Also, can you please make uninstall
option in cmake script? It is not really comfortable to use rm $PREFIX/share/glvnd/egl_vendor.d/10_android_wrapper.json
or export __EGL_VENDOR_LIBRARY_FILENAMES="$PREFIX/share/glvnd/egl_vendor.d/50_mesa.json"
And can you please make eglSwapBuffer report success to suppress this warning?
** Failed to set swap interval. Results may be bounded above by refresh rate.
I added an uninstall
target now and fixed eglSwapInterval
. I also disabled HardwareBuffer surfaces for now, as I can't easily test them.
I can give you remote access to my test device if you need.
I found something: If I lock and unlock the HardwareBuffer, it works also in the emulator. That probably forces all changes to get applied to the mapped buffer. The performance is still low, and interestingly it goes down over time.
Actually in emulator GraphicBuffer (and AHardwareBuffer) are implemented with memfd or ashmem file descriptor. So the content of the texture is copied back and forth all the time. It is relevant for emulator, but not for real devices.
Interesting, I thought in the emulator it would also use DMABUF, that explains that. I added the env variable TERMUX_EGL_DISABLE_HWBUF
which disables hardware buffer if set, to easily test both implementations.
I also added a simple frame time estimation for PBuffer rendering. Could you try glmark again and see if the rendering is smoother?
I also tried to optimize the HardwareBuffer surfaces a bit, you can see if that helped. Though they should still be upside-down in X, but at least the color should be right with the BGR format.
Maybe I miss something but in block mode perfomance seems to be same
Idle mode seems to be broken... And it also has low perfomance.
Maybe I miss something but in block mode perfomance seems to be same
That was just a guess. I made it such that the gl framebuffer and renderbuffer objects get reused if possible instead of being recreated on every eglMakeCurrent
.
Idle mode seems to be broken... And it also has low perfomance.
What exactly is broken? The same thing as before?
I mean idle mode has a bit better perfomance than a block mode. It was much better in earlier commits.
Maybe in this case would be better to use xcb_shm_put_pixels
. You can avoid waiting by saving cookie of request and check if it was processed by invoking xcb_poll_for_reply(conn, cookie.sequense, &reply, &err)
. It will not be bound to actual screen refreshing but this way you will have much faster rendering to screen.
@tareksander I think you should check how mesa's src/vulkan/wsi/wsi_common_x11.c
works. It is not related to EGL, but it can tell you how to handle frames.
Also there is interesting code in src/gallium/auxiliary/vl/vl_winsys_dri3.c
. It is related to dri3 but still can be useful.
@tareksander maybe you can implement something like buffer queue? I can integrate it to termux-x11
and Xwayland/Xvfb/Xtigervnc of termux.
What kind of buffer queue?
Like in Android. Surfaces in Android have buffer queues with Consumers and Producers. I am not sure I can describe it correctly. https://source.android.com/docs/core/graphics/arch-bq-gralloc . If you can implement using multiple buffers at once with changing buffers on demand it will make everything a bit more faster.
eglSwapInterval
for 0 and 1