servo / gleam

Generated OpenGL bindings and wrapper for Servo.
Other
84 stars 58 forks source link

Flush explicitly before deleting textures on Mac #217

Closed steven-michaud closed 4 years ago

steven-michaud commented 4 years ago

After landing my patch for https://bugzilla.mozilla.org/show_bug.cgi?id=1666293, I discovered that it would be bypassed by all calls to gl.delete_textures from Rust code in the Mozilla tree. So those calls are still effected by Apple's bug, for example here. As best I can tell, this patch will clear up the problem.

I'm submitting my patch here because the gleam crate, though included in the Mozilla code tree, is third party code.

steven-michaud commented 4 years ago

For what it's worth, I did a successful tryserver build with my patch in the Mozilla tree:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=01ec6ad5751f958e1766fb20addb6731d9142f13

kvark commented 4 years ago

That's a very interesting patch! And an impressive investigation :)

I'm concerned that flushing may negatively affect execution on GPUs that are not affected by the bug. This makes me wonder why the original patch was landed without any kind of blocklist. It sounds like at least limiting it to Intel macOS machines would help?

For WebRender, we typically first add a record to our bug database here - https://github.com/servo/webrender/wiki/Driver-issues. I'll be happy to add this one on behalf of you.

Then we land add some sort of a workaround in WebRender. It can detect it and apply the workaround, or it can expose an option in RendererOptions that Gecko fills in based on blocklists. In any case, the workaround should he handled by WebRender, not gleam.

steven-michaud commented 4 years ago

On the Mac, flushing already takes place (though at the wrong time) on every call to fDeleteTextures(), regardless of graphics hardware. So there shouldn't be any performance problem. This shown by the following several lines from the output of my dtrace script at https://bugzilla.mozilla.org/show_bug.cgi?id=1666293#c3. GLEngine-gleUnbindTextureObject and libGPUSupportMercury.dylib-gpusSubmitDataBuffers aren't graphics hardware specific. They're used on all major releases of macOS going back at least to 10.12.

    libsystem_kernel.dylib`mach_msg_trap+0xa
    IOKit`io_connect_method+0x17f
    IOKit`IOConnectCallMethod+0xf4
    IOKit`IOConnectCallStructMethod+0x23
    IOAccelerator`IOAccelContextSubmitDataBuffersExt2+0xfd
    libGPUSupportMercury.dylib`gpusSubmitDataBuffers+0x88
    AppleIntelHD5000GraphicsGLDriver`IntelCommandBuffer::getNew(GLDContextRec*)+0x30
    AppleIntelHD5000GraphicsGLDriver`intelSubmitCommands+0xb2
    GLEngine`gleUnbindTextureObject+0x3a
    GLEngine`gleUnbindDeleteHashNamesAndObjects+0x9f
    GLEngine`glDeleteTextures_Exec+0x2d4
    XUL`mozilla::WebGLTexture::~WebGLTexture()+0x106
    XUL`mozilla::WebGLTexture::~WebGLTexture()+0xe

Working around Apple's bug in WebRender will probably be less simple than doing it here, but I'll look into it.

steven-michaud commented 4 years ago

Also, I'm not completely sure that my patch won't also help on other kinds of graphics hardware. As per https://bugzilla.mozilla.org/show_bug.cgi?id=1666293#c4, I'll be waiting over the next few weeks to see what effect it has on crash signatures containing the string "gpusGenerateCrashLog".

These can be generated by many different bugs, which only have in common that they're Apple graphics driver bugs.

kvark commented 4 years ago

Well, that's great news for sure! So perhaps, adding this bug to known list of bugs, and doing the gl.flush() call on WebRender side behind cfg(target_os = "macos") is all we need.

steven-michaud commented 4 years ago

But that would need to be done in many different places, as per https://bugzilla.mozilla.org/show_bug.cgi?id=1666293#c13.

Maybe I just need to add code to WebRender through which all calls to fDeleteTextures must be routed, and add the workaround there.

steven-michaud commented 4 years ago

Once I've figured out what I'm doing, I'll add a section to https://github.com/servo/webrender/wiki/Driver-issues. Should it be at the bottom?

kvark commented 4 years ago

Maybe I just need to add code to WebRender through which all calls to fDeleteTextures must be routed, and add the workaround there.

That sounds good. Hopefully, it's not going to be too invasive.

Should it be at the bottom?

Yes, appending to the bottom is fine.

steven-michaud commented 4 years ago

I can do something in WebRender. But I really think it'd be much better to do the workaround here. Please tell me why you think we shouldn't.

kvark commented 4 years ago

Reason is consistency. So far, gleam was not a place to put API workarounds in. It's just a wrapper around GL. We've been putting all the workarounds in WebRender, so it's hard to make a case for an exclusion here.

steven-michaud commented 4 years ago

Thanks for letting me know. I still don't like it, but I understand your reasoning. I'll see what I can manage in WebRender.