lv2 / pugl

A minimal portable API for embeddable GUIs
https://gitlab.com/lv2/pugl/
ISC License
174 stars 34 forks source link

WGL and compositing window manager #76

Closed jjYBdx4IL closed 2 years ago

jjYBdx4IL commented 2 years ago

See here:

https://github.com/glfw/glfw/blob/63da04e5ced93fcb87a20513acdff5d78b1166ff/src/wgl_context.c#L321-L342

That can reduce CPU usage on the driver level during swapBuffers() by up to 97% on Windows.

ie.

DwmFlush();

drobilla commented 2 years ago

Interesting, thanks for the reference. I'm not sure if the problem this is addressing affects Pugl though, since the main loop works differently and applications don't draw imperatively whenever they want to. There are already waits in the main loop of Pugl applications. Most of the discussion around this that I can find is around examples that have a fast-as-possible main loop that just renders and calls and loops to render again immediately, not really talking much to the operating system along the way. This indeed doesn't work well, and any conclusions gathered from experiments with programs like this don't apply here.

That said, I don't know if it doesn't either, but if it does, my instinct would be that the issue is elsewhere. Unlike with GLFW, it is Pugl's job to tell the application when to draw (usually because Windows itself said so), so if there is a DwmFlush missing, the question is: where? It seems like opinions differ here, but in any case, it will be different in Pugl.

The fundamental difference is that Pugl needs to send a PUGL_EXPOSE at the appropriate time. This should be as late as possible while still managing to render frames in time. Otherwise - for example if a blocking flush gets added in the wrong place - input latency will get worse. Any blocking here also raises the question of whether new input events could have arrived during the wait (I have no idea but it seems reasonable), which sort of... recursively brings up input latency issues. This is incredibly important for the kinds of applications Pugl is designed for - probably moreso than framerate itself (although I don't think this is a trade-off we are forced to make).

Put another way, at least in Pugl, blocking to repeatedly flush the WM before swapping buffers should never need to happen. If rendering needs to be happening later than it is (to avoid waits in the driver or whatever), then input events should continue to be processed until it's really time to render. To me, the underlying issue here seems to be the application saying "draw now!" but it not yet being the appropriate time to draw. In Pugl, this situation is avoided entirely, by design.

Perhaps a DwmFlush makes sense somewhere else though. OpenGL drivers are certainly a minefield, ideas about how things should work only get you so far... I only have Win10 on AMD and Intel hardware available though, no NVidia, so if things are weird there someone else is going to have to demonstrate it.

tl;dr: This stuff is complicated, and in a particular area where "GLFW does it" doesn't mean it makes sense for Pugl. It'll take a good bit of experimentation to see if/how/where a similar trick could improve things.

jjYBdx4IL commented 2 years ago

The glfw code checks for a compositing window manager. I wonder how that check works out in full screen. It's known that non-full-screen 3D has performance issues. Maybe that's related and your caution is actually achieving the opposite, ie. making a bad situation worse? GLFW is proven. And I'm currently pretty stunned by the lack of performance in JUCE, which isn't using this either atm. We are talking about FullHD 60fps without a single pixel rendered except glClear and the CPU core sits at 100% (almost all of it on the OS/driver level).

jjYBdx4IL commented 2 years ago

There is this check:

if (!window->monitor)
    {

I'm not 100% sure but I think that's related to full-screen.

drobilla commented 2 years ago

Maybe that's related and your caution is actually achieving the opposite, ie. making a bad situation worse?

... No, I am not making a bad situation worse by not blindly copying code from from some context in some library which, as I have already explained, does not correspond to anything in Pugl.

GLFW is proven.

Again, the way that the main loop and rendering itself works in Pugl is fundamentally different than GLFW.

Here is some reading about what happens when people carelessly assume things about DwmFlush specifically and use it incorrectly: https://www.vsynctester.com/firefoxisbroken.html

And I'm currently pretty stunned by the lack of performance in JUCE, which isn't using this either atm. We are talking about FullHD 60fps without a single pixel rendered except glClear and the CPU core sits at 100% (almost all of it on the OS/driver level).

What does this have to do with Pugl?

I'm not 100% sure but I think that's related to full-screen.

Pugl doesn't even support exclusive fullscreen in any real way. IIRC you need to go out of your way to get the fullscreen mode that actually makes any difference with respect to rendering. So in this context, conditionals around the compositing window manager don't matter much. The (compositing) branch that this code is on is the important one, AFAIK it is always there on more or less any even remotely modern version of Windows.

jjYBdx4IL commented 2 years ago

Well, I put the DwmFlush() right at the end of the Expose event handling. Not sure how a screwed up Mozilla vsync time measurement is related to this. Calling DwmFlush()+glSwapBuffer() doesn't half my frame rate, so DwmFlush() cannot block until vsync as suggested. Unless Microsoft is talking about something entirely different here: https://docs.microsoft.com/en-us/windows/win32/api/dwmapi/nf-dwmapi-dwmflush

drobilla commented 2 years ago

Well, I put the DwmFlush() right at the end of the Expose event handling

Please be more specific, i.e. provide a patch, and a program, and describe why you did this and what effect it had.

As far as I can tell, doing this in various places has one of two effects: halving the framerate, or no noticeable effect at all.

jjYBdx4IL commented 2 years ago

pugl_shader_demo, uncommented glDrawElementsInstanced call because it throws illegal access exception in nvidia ogl driver.

wait a few seconds and one cpu core goes to 100% while staying at 60fps. Windows 11, amd64, Ryzen 5600x, GTX 1060.

drobilla commented 2 years ago

uncommented glDrawElementsInstanced call because it throws illegal access exception in nvidia ogl driver

... clearly something deeper is wrong here that needs to be figured out first. Trying to optimize CPU usage in such a situation is pointless.

Also note that those programs currently have timeouts that try to ride this line as tightly as possible for the above-mentioned input latency reasons (and so they get some testing), but the calculations aren't quite right yet. The Right Way to do this (or the most resilient way to the myriad configurations out there anyway) is still being figured out, but here where input latency doesn't matter you can just hack the timeout to zero to draw as fast as possible (or at vsync if that is on, which is perhaps the right thing to do specifically on Windows with OpenGL and vsync on anyway). Alternatively I just pushed a improve-update-rate that hopefully calculates this better.

In any case, commenting out the drawing does not exhibit any unusual CPU usage for me on Windows 10 with either Radeon discrete (on 1950x) or Intel integrated graphics (on 7600U).

drobilla commented 2 years ago

The Internet seems to suggest that this is because NVidia's driver spins when it needs to wait. So the timeout and input latency stuff probably is what needs to be correct here (that wait definitely doesn't consume CPU and doesn't depends on the graphics driver). You can probably tinker around and find a static timeout value that makes it go away, I am guessing that using the frame period will make the CPU usage issue go away, although it won't quite hit 60 FPS anymore.

The process of a Pugl main loop iteration looks like this:

  1. Call puglUpdate, which gathers and dispatches any events from the system
  2. Dispatch PUGL_UPDATE to views so they can trigger a redisplay if they are continuously drawing
  3. If the view does trigger a redisplay, then dispatch a PUGL_EXPOSE so the rendering happens (and buffers get swapped and waiting possibly happens, and so on).

If the wait is expensive (step 3), then the problem is that step 2 is being arrived at too early. It may be okayish to flush after step 3 but it's not ideal since this takes time away that could be spent doing useful things: either processing events in Pugl, or doing whatever in the application code.

There is API to get a bunch of information about this sort of thing (DwmGetCompositionTimingInfo) but I haven't had much luck with it.

jjYBdx4IL commented 2 years ago

So you are saying that while Apple officially deprecates OpenGL and gets all the blame for it, the others (NVidia) do it silently? Lul. But I guess there is a reason why usually everyone uses Direct2|3D on Windows..... thx for your time.

jjYBdx4IL commented 2 years ago

The error is actually coming from glBufferData, opengl error id 1282. Seems to be VAO related. But the error is unrelated. This performance issue exists in a quite a bit of softwares it seems, and in JUCE there is error checking and it's not throing anything.

drobilla commented 2 years ago

Okay then. Please refrain from abusing the issue tracker like this in the future, this is not a discussion forum.