Closed rombust closed 5 months ago
I haven't looked at this code for a very long time, but if ClanLib is locking the buffer again before using it, then OpenGL does guarantee that it is safe to now change it, regardless of any pipeline barrier concerns from the GPU's point of view. So I still think this is a bug in the Intel driver and not ClanLib's fault.
That said, if you have a workaround for it, and it has no apparent side effects, we might as well apply it.
There was an unexpected side effect in a certain specific use case. Using a NVidia Quadro K4000 on a 2016 PC. It increased the speed of a routine from 97 seconds to 66 seconds. I haven't checked this with newer hardware or Direct3D.
I finally found the issue. Fixed for Intel. " If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated."
Source: https://registry.khronos.org/OpenGL-Refpages/gl4/html/glBufferSubData.xhtml
You are reading that wrong. That text is saying that the OpenGL driver must drain the pipeline before the data store can be updated. In other words: OpenGL will stall until the GPU is done using the buffer. The exact thing the Intel driver seems to be failing at doing.
Not that it really matter what the spec says - if Intel can't write a working driver all we can do is workaround it. But it is still not a bug in our implementation. :)
Yeah, I said to myself "that can't be it". After the total of 40 hours last week trying to work out what's wrong, and discovering that having this line of code at the start of our Examples//Display/Path example fixed the problem :
glDeleteSync(glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0));
This was outside the main loop, just after the display window was created. (Note, to replicate the problem for the ClanLib Path example, I had to reduce the VertexBuffer size to 102410 (from 10241024) in RenderBufferBatch)
I decided to ask chatgpt ... yes, I know lol ... It suggested that synchronization was required.
Yeah, it sounds like an Intel bug.
I'm 100% sure is an Intel bug. OpenGL is very clear about these things - there's an implicit pipeline barrier and a fence waiting for the last GPU operation using the buffer to complete. How Intel messed that up I don't know, but what we see isn't something that should be possible for ClanLib to produce. If you have a workaround it then feel free to commit it since users don't care where the bug it is.
Then there's the fact this entire thing implies we have a situation where we stall the GPU. That could be pretty bad for performance. Why exactly that is happening I don't know or what our strategy even was to avoid it.
Using clan::Path to draw a rounded box does not work with OpenGL on an Intel GPU
The issue is caused with the clan::Path internals incorrectly uploading the graphics. The completion status of texture upload that's using previous cached transfer buffers is not checked. Thus we are modifying the buffer before the previous buffer was uploaded
It works using the Direct3D target. It works on all targets with Nvidia GPU's
The fix is to not cache transfer buffers, and instead recreate them. See patch patch.txt