swapBuffers() slow behavior when using NanoVG

AaxAisa commented 3 years ago

Hello! I'm developing for ARM, using EGL + OpenGL ES 3.0. I'm using LWJGUI that in turn uses NanoVG for rendering of all shapes.

My rendering runs in a loop.

I can render pretty advanced 3d with OpenGL on my device, so I am not concerned with the device itselve being too slow to handle the rendering.

But when using NanoVG, glSwapBuffers behaves oddly. Basically, It usually takes about 0.0003 seconds to execute, but every X cycles it would take like 0.3 seconds instead, i.e. a thousand times more. I am currently getting this issue even on a small test project that basically renders one rectangle with NanoVG. I am not getting it if I render the same rectangle myself, which is extremely weird.

Could you at least suggest what could cause such weird behavior during buffer swap? I read in OpenGL docs that glSwapBuffers would wait for the videocard to finish all the queued tasks with the buffer before swapping it. But surely, something is going on very wrong if it takes a third of a second!

Please help! I'll keep digging on my own and will let you know if I find something.

mulle-nat commented 3 years ago

I can't help much, but here is what I would try

post the small example project, so one could look for obvious errors
does the same problem show up on x86 ?
run dtrace or strace to see, if something is making an unexpected system call
run something like hotspot to possibly catch what is taking up the time
eliminate java from the equation, it gcs :)

AaxAisa commented 3 years ago

Narrowed it down to something to do with nvgSave() / nvgRestore(), but the problem is probably not in nanovg but in LWJGUI that uses Nanovg as a drawing engine.

What I did is just removed all calls to these functions and the program stopped stuttering. My running theory is that LWJGU called nvgSave without later calling nvgRestore somewhere, or vice versa, and the buffer of those states overfilled. Unfortunately, I am still not completely clear what that mechanism does in nanovg, it's all rather vague.

But I thought I'd report the progress all the same.

memononen commented 3 years ago

Save and restore store the nvg draw state (color, font size, etc). It's simple way to leave a common state intact after drawing stuff. You can think save = push, restore = pop. The pairs need to match.

mulle-nat commented 3 years ago

But isn't save/restore just a memcpy over two fairly small structs ? How could that have the observed effect ?

AaxAisa commented 3 years ago

That's true, but what if you push 100 times and pop only 50? Or vice versa, what if you push 5 times but then pop 6 times?

The LWJGUI code is weird in some places. I've already found at least 2 places where nvgRestore could be called without nvgSave being called beforehand (due to various visibility conditions, etc.)

For example we go into a rendering function, then decide that the object is not visible and doesn't need to be rendered, and then call nvgRestore in the end of that function anyway, which is clearly their mistake. It doesn't seem to hard-crash the application, and on PC the problem is at the very least imperceptible, even if it actually exists. But on ARM, on a relatively weak device with possibly buggy or underdeveloped OpenGL implementation - it causes those huge stutters.

I already know that OpenGL implementation here is buggy in at least 1 unrelated place. So it's not unreasonable to assume that it could have oddities elsewhere - after all, every vendor can have their own implementation of OpenGL API and is responsible for updating it themselves (which they didn't bother doing).

glSwapBuffers() behavior is to wait until OpenGL finishes its current tasks before swapping said buffers. It looks as if something causes OpenGL driver to bug out and freeze for a moment, and glSwapBuffers then just waits on it. Perhaps some null values are passed, or something else - something that other implementations could be handling by discarding them, but my current ARM one does not. It's all speculation at this point because I don't even have a source for these drivers. What I know is:

1) removing all nvgSave() and nvgRestore() fixes the issue 2) It is not NanoVG, GLFW or even LEGUI code that bugs out, because there are no exceptions, and they take standard time to complete. 3) OpenGL does not pass any error flags 4) But it still takes 1000 times (literally) longer for glSwapBuffers() to complete every once in a while.

mulle-nat commented 3 years ago

The code is not really complicated:

#define NVG_MAX_STATES 32

void nvgSave(NVGcontext* ctx)
{
    if (ctx->nstates >= NVG_MAX_STATES)
        return;
    if (ctx->nstates > 0)
        memcpy(&ctx->states[ctx->nstates], &ctx->states[ctx->nstates-1], sizeof(NVGstate));
    ctx->nstates++;
}

void nvgRestore(NVGcontext* ctx)
{
    if (ctx->nstates <= 1)
        return;
    ctx->nstates--;
}

Basically its a simple push and pop. There is no malloc, but there is a limit how many pushs you can do (32). Your observed delay can not be "within" these functions. What I imagine could happen is, that the mismatched save/restore calls, produce unexpected state for the drawing code. Maybe one of the parameters forces something to do something really slowly, like a border with infinite width or so.

e.g.

"draw fat border",
saveState(), 
"draw small border", 
saveState (fails, due to overflow because of previous mismatches) 
...
popState() 
"draw small border" is fat now

I like asserts and I think these would be useful to add, to check against such problems during development:

void nvgSave(NVGcontext* ctx)
{
    assert( ctx->nstates < NVG_MAX_STATES);

    if (ctx->nstates >= NVG_MAX_STATES)
        return;
    if (ctx->nstates > 0)
        memcpy(&ctx->states[ctx->nstates], &ctx->states[ctx->nstates-1], sizeof(NVGstate));
    ctx->nstates++;
}

void nvgRestore(NVGcontext* ctx)
{
    assert (ctx->nstates >= 1);

    if (ctx->nstates <= 1)
        return;
    ctx->nstates--;
}

memononen / nanovg

swapBuffers() slow behavior when using NanoVG #608