sphair / ClanLib

ClanLib is a cross platform C++ toolkit library.
Other
344 stars 76 forks source link

Using clan::Path with OpenGL on Intel GPUs #125

Closed rombust closed 5 months ago

rombust commented 10 months ago

Using clan::Path to draw a rounded box does not work with OpenGL on an Intel GPU

PXL_20240108_114025355

The issue is caused with the clan::Path internals incorrectly uploading the graphics. The completion status of texture upload that's using previous cached transfer buffers is not checked. Thus we are modifying the buffer before the previous buffer was uploaded

It works using the Direct3D target. It works on all targets with Nvidia GPU's

The fix is to not cache transfer buffers, and instead recreate them. See patch patch.txt

dpjudas commented 10 months ago

I haven't looked at this code for a very long time, but if ClanLib is locking the buffer again before using it, then OpenGL does guarantee that it is safe to now change it, regardless of any pipeline barrier concerns from the GPU's point of view. So I still think this is a bug in the Intel driver and not ClanLib's fault.

That said, if you have a workaround for it, and it has no apparent side effects, we might as well apply it.

rombust commented 10 months ago

There was an unexpected side effect in a certain specific use case. Using a NVidia Quadro K4000 on a 2016 PC. It increased the speed of a routine from 97 seconds to 66 seconds. I haven't checked this with newer hardware or Direct3D.

rombust commented 5 months ago

I finally found the issue. Fixed for Intel. " If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated."

Source: https://registry.khronos.org/OpenGL-Refpages/gl4/html/glBufferSubData.xhtml

dpjudas commented 5 months ago

You are reading that wrong. That text is saying that the OpenGL driver must drain the pipeline before the data store can be updated. In other words: OpenGL will stall until the GPU is done using the buffer. The exact thing the Intel driver seems to be failing at doing.

Not that it really matter what the spec says - if Intel can't write a working driver all we can do is workaround it. But it is still not a bug in our implementation. :)

rombust commented 5 months ago

Yeah, I said to myself "that can't be it". After the total of 40 hours last week trying to work out what's wrong, and discovering that having this line of code at the start of our Examples//Display/Path example fixed the problem :

glDeleteSync(glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0));

This was outside the main loop, just after the display window was created. (Note, to replicate the problem for the ClanLib Path example, I had to reduce the VertexBuffer size to 102410 (from 10241024) in RenderBufferBatch)

I decided to ask chatgpt ... yes, I know lol ... It suggested that synchronization was required.

Yeah, it sounds like an Intel bug.

dpjudas commented 5 months ago

I'm 100% sure is an Intel bug. OpenGL is very clear about these things - there's an implicit pipeline barrier and a fence waiting for the last GPU operation using the buffer to complete. How Intel messed that up I don't know, but what we see isn't something that should be possible for ClanLib to produce. If you have a workaround it then feel free to commit it since users don't care where the bug it is.

Then there's the fact this entire thing implies we have a situation where we stall the GPU. That could be pretty bad for performance. Why exactly that is happening I don't know or what our strategy even was to avoid it.