szeged / webrender

A GPU-based renderer for the web
https://doc.servo.org/webrender/
Mozilla Public License 2.0
45 stars 7 forks source link

Run Gecko with gfx-rs #198

Open dati91 opened 6 years ago

dati91 commented 6 years ago

This issue is for tracking the progress with gecko's reftests.

We have some reftest results to show (layout/reftests): Note: these are local results, not from the treeherder.

Linux/Vulkan Linux/Original
REFTEST INFO | Successful: 33264 (33206 pass, 58 load only) REFTEST INFO | Successful: 33290 (33232 pass, 58 load only)
REFTEST INFO | Unexpected: 991 (953 unexpected fail, 38 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception) REFTEST INFO | Unexpected: 965 (937 unexpected fail, 28 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1383 (713 known fail, 0 known asserts, 261 random, 409 skipped, 0 slow) REFTEST INFO | Known problems: 1383 (713 known fail, 0 known asserts, 261 random, 409 skipped, 0 slow)
Windows/DX12 Window/Original
REFTEST INFO | Successful: 28185 (28127 pass, 58 load only) REFTEST INFO | Successful: 28277 (28219 pass, 58 load only)
REFTEST INFO | Unexpected: 123 (49 unexpected fail, 72 unexpected pass, 0 unexpected asserts, 2 failed load, 0 exception) REFTEST INFO | Unexpected: 17 (3 unexpected fail, 14 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 844 (458 known fail, 0 known asserts, 166 random, 220 skipped, 0 slow) REFTEST INFO | Known problems: 844 (458 known fail, 0 known asserts, 166 random, 220 skipped, 0 slow)
Windows/Vulkan Windows/Original (Same as above)
REFTEST INFO | Successful: 28245 (28187 pass, 58 load only) REFTEST INFO | Successful: 28277 (28219 pass, 58 load only)
REFTEST INFO | Unexpected: 47 (14 unexpected fail, 33 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception) REFTEST INFO | Unexpected: 17 (3 unexpected fail, 14 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 844 (458 known fail, 0 known asserts, 166 random, 220 skipped, 0 slow) REFTEST INFO | Known problems: 844 (458 known fail, 0 known asserts, 166 random, 220 skipped, 0 slow)
MacOS/Metal MacOS/Original
REFTEST INFO | Successful: 32508 (32450 pass, 58 load only) REFTEST INFO | Successful: 32501 (32443 pass, 58 load only)
REFTEST INFO | Unexpected: 1679 (1498 unexpected fail, 181 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception) REFTEST INFO | Unexpected: 1746 (1502 unexpected fail, 244 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1399 (811 known fail, 0 known asserts, 197 random, 391 skipped, 0 slow) REFTEST INFO | Known problems: 1391 (803 known fail, 0 known asserts, 197 random, 391 skipped, 0 slow)

Some pictures with gecko: https://drive.google.com/open?id=1rMilZCCpqRRHzN_D2L0VkXXn0ia8_KQ0

dati91 commented 5 years ago

Aaaaaaand, gecko with vulkan on linux: gecko_vulkan gecko_vulkan2

( If you want to try it: the repo: https://github.com/dati91/gecko-dev/tree/gecko_with_wr build as usual ( only on linux atm ) run with: MOZ_ACCELERATED=1 MOZ_WEBRENDER=1 ./mach run )

also this is VERY experimental ;)

mstange commented 5 years ago

Nice!

I have some suggestions for the points you raised in the initial comment: it basically comes down to drilling backend-specific holes in the API.

as a workaround we could pass down to WR the display/window pointers and create the surface with gfx, but the current gecko code is gl heavy as well...

I think you'll want Gecko to manage as much of this as possible, and only give WR the lowest-level, backend-specific object that you need. E.g. with GL, let Gecko make the context current, with Metal, let Gecko pass the Metal context to WR, etc.

We currently not support external textures, because they are opengl textures, and not sure how to share those with vulkan/dx12/metal

There is no need to support arbitrary API combinations here. You can expect Gecko to give you external texture handles for the "native" API that you're targeting on that platform, and not support other types of handles. For example, on Windows, Gecko has a HANDLE for the texture, and on macOS, Gecko has an IOSurface. Gecko only wraps these native objects in OpenGL texture handles because that's what the WebRender API expects at the moment. For example, the Windows code only works because Gecko and WebRender use the same ANGLE device, and it's ANGLE that keeps an internal mapping of "OpenGL textures" to the underlying D3D texture objects. The macOS code wraps an IOSurface in an OpenGL texture using CGLTexImageIOSurface2D, but if the WebRender API accepted an IOSurface instead, then this wrapping (or the equivalent wrapping for Metal) could be done inside WebRender.

dati91 commented 5 years ago

The result of MOZ_ACCELERATED=1 MOZ_WEBRENDER=1 ./mach reftest layout/reftests/:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 24749 (24693 pass, 56 load only)
REFTEST INFO | Unexpected: 980 (952 unexpected fail, 28 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1033 (613 known fail, 0 known asserts, 127 random, 293 skipped, 0 slow)
dati91 commented 5 years ago

And also run the same revision with the original WR and got this:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 33200 (33142 pass, 58 load only)
REFTEST INFO | Unexpected: 975 (947 unexpected fail, 28 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1421 (750 known fail, 0 known asserts, 257 random, 414 skipped, 0 slow)

I have no idea why it differ in 8834 tests. I tested ours again and got

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 24698 (24662 pass, 36 load only)
REFTEST INFO | Unexpected: 965 (929 unexpected fail, 36 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1011 (602 known fail, 0 known asserts, 164 random, 245 skipped, 0 slow)

So not sure what is going on, but will look into this.

zakorgy commented 5 years ago

I have updated our WR and tried it with a newer version of gecko and started to take a look at the layout tests (the repo: https://github.com/zakorgy/gecko-dev/tree/gecko_wr_linux_vulkan), and compared it to the same version with the original WR. I ran the tests per subdirectory, because if I ran it for the entire layout/reftests directory, the testing stopped with a ExceptionHandler::WaitForContinueSignal waiting for continue signal... message in random places.

So here are the results:

Original WR:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 18505 (18476 pass, 29 load only)
REFTEST INFO | Unexpected: 511 (493 unexpected fail, 17 unexpected pass, 0 unexpected asserts, 1 failed load, 0 exception
REFTEST INFO | Known problems: 750 (408 known fail, 0 known asserts, 144 random, 198 skipped, 0 slow)

Our WR with Vulkan:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 18498 (18469 pass, 29 load only)
REFTEST INFO | Unexpected: 518 (496 unexpected fail, 22 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception
REFTEST INFO | Known problems: 750 (408 known fail, 0 known asserts, 144 random, 198 skipped, 0 slow)

The sums of the numbers are equal, so this is good news. But the number of tests is less compared to the results from https://github.com/szeged/webrender/issues/198#issuecomment-416938213, which is weird. Maybe it's because we use the reftest.list files of the subdirectories not the one in layout\reftests.

Here are the tests which differ:

These fail with the original and passes with our WR, but I think it's worth to take a look at these:

These fail with our WR and passes with original WR:

This one crashes, because we have a maximum limit of 8192 for instances, and this want to draw around 40000 instances. If we increase our limit the test passes. (Unfortunately this kind of fix would be an overhead, so we must figure out something else, maybe reallocate our index buffer in these extreme cases, otherwise use the default limit):

And there are some unexpected passes too:

zakorgy commented 5 years ago

I have checked the unexpected passes, and they are actual passes.

zakorgy commented 5 years ago

I have checked the first section of different tests (fail with the original and passes with our WR), and found that the first two passes if I run them one by one. The third one (layout/reftests/bugs/383883-2.html) is a flaky test with both OpenGL and Vulkan. Since it has a minor pixel difference, and I didn't found any suspicious in the test result images, I will mark them as resolved.

bholley commented 5 years ago

I'd also suggest doing a Gecko try push rather than running them locally. It's not uncommon for certain tests to behavior differently on local-vs-automation, and automation is the canonical environment as far as the expectations are concerned.

kvark commented 5 years ago

@bholley I was just talking with @zakorgy about the same thing (learning to try push) yesterday on the call :)

dati91 commented 5 years ago

@bholley @kvark running a try push will work with vulkan? Or its just like a travis/appveyor which uses e.g. OSMesa?

kvark commented 5 years ago

It's both building and testing infrastructure. Of course we'd want to eventually run some of the graphics tests on them, but for now at least having Gecko built would be great.

bholley commented 5 years ago

I wouldn't be surprised if our windows test machines had Vulkan drivers already. Windows Firefox currently runs on top of D3D, and @jdashg tells me that Vulkan drivers have been bundled with D3D drivers for a while now.

kvark commented 5 years ago

... and if we had Windows10 build bots, we could even test consistently on D3D12 WARP device

dati91 commented 5 years ago

We have some news (good and bad) about the Mac/metal gecko.

We finally managed to build it. @mstange thanks for the help. And we can run it as well. Thats where things got a bit complicated. It seems like we have the same issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1493330 We tried it with 10.14 sdk and with 10.13 sdk. It "works better" with 10.13. This is a screenshot with the original Gecko without WR enabled. (with WR there are some suspicious logs) screenshot 2018-10-04 at 14 17 24 This happens if we drag the tab. The same thing with our wr: screenshot 2018-10-04 at 14 05 23 And with the bookmark side bar, and typing into the url bar: screenshot 2018-10-04 at 14 21 04 Also we checked it with youtube and we can hear the sound of the video.

So this looks promising and from the logs it seems like its actually ours, but we are in the dark until the "black screen" issue is not resolved. (pun intended) After that we can check that the drawing is really with wr/gfx/metal. ( The current repo https://github.com/dati91/gecko-dev/tree/wr2 )

dati91 commented 5 years ago

The result of reftest-sanity:

orig
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 109 (93 pass, 16 load only)
REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 34 (17 known fail, 0 known asserts, 0 random, 17 skipped, 0 slow)
REFTEST SUITE-END | Shutdown
wr/metal
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 110 (94 pass, 16 load only)
REFTEST INFO | Unexpected: 10 (9 unexpected fail, 1 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 23 (6 known fail, 0 known asserts, 0 random, 17 skipped, 0 slow)
REFTEST SUITE-END | Shutdown

But with the "black screen", this is just a preliminary result.

zakorgy commented 5 years ago

We managed to build with metal (10.13 SDK) without the black screen. Here is an image from Wikipedia: capture1 As you can see the text is wrong. We captured this frame with WR, and loaded the capture in Renderdoc. We found here that the Color0 texture is not uploaded properly. There are incorrect characters in some places. capture2

dati91 commented 5 years ago

We managed to run gecko with vulkan, this time on Windows. image

It looks good, but for some reason it always stops with Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=21.8547) [GFX1-]: Receive IPC close with reason=AbnormalShutdown And restart the WR, then stops again. And after the 4th time it will fall back to the default backend with GPU process disabled after 4 attempts. Maybe there is a time limit we hit? not sure. Because there is no log about any error in c++ or rust. Running with VS didn't helped, still no error and my guess that wouldn't catch a rust error anyway.

bholley commented 5 years ago

It sounds like your GPU process crashed (after four crashes it falls back to basic compositor). You should be able to find the crash report in about:crashes.

Unrelated FYI - I'm switching WR to use all immutable storage for textures, which may be relevant to the gfx-rs port.

dati91 commented 5 years ago

@bholley The about:crashes is an empty page with a line No crash reports have been submitted.

dati91 commented 5 years ago

It could be related to popups(?). If I scroll a wikipedia page (without hitting anything with the cursor) it works fine. But if I hover a link, which will produce a "popup" (not a real window) on wikipedia, it just froze the page, restarts the WR and then draws that popup without a problem. Then if i scroll the page and try to hover another link, it will do the same: froze, restart, show the popup. (max 4 times ofc.) It could be related to some kind of hit test which produce something that has a different behavior on windows, that on other platforms. Because this works fine on linux with vulkan. And even works with metal on macos.

But even the original start page restarts our WR while loads it. So no cursor movement needed.

zakorgy commented 5 years ago

Increasing the frame count (previously it was 1) and resetting the fence we used for freeing images produces a much better result with metal: screenshot 2018-10-10 at 16 04 55 But If I start gecko with the default page sometimes it crashes with the Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=220.092) Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=220.52) [GFX1-]: Receive IPC close with reason=AbnormalShutdown message. Also on larger pages there are still some artefacts: (notice the missing text from the tab and the video lengths, which are drawn wrong in some cases) artefacts

kvark commented 5 years ago

@dati91 good guess! IIRC popups create separate GL contexts (and WR renderer/backend) that share resources (shaders for the most part atm) with the main context. We should make it work so that the VkInstance and VkDevice are the same, and only another swapchain is created.

bholley commented 5 years ago

@kvark @dati91 So I think there's actually three levels of "popups".

Some "popups" (like the GitHub emoji selector that appears when you type ":") are just out-of-flow html elements. These are just part of the page, and shouldn't need any special handling for what you're doing here.

There are also certain overlays drawn by the browser that get their own OS-level window. The smart location bar (AwesomeBar) is one of them. Tool-tips are another, which I think is what @dati91 is talking about when hovering links on wikipedia. You can prove that something gets its own OS-level window by resizing your browser and getting the popup to be rendered outside of the main browser window (you can see this for both the examples I mentioned above). In those cases, Firefox doesn't use WebRender at all, it just uses BasicCompositor/Skia.

Finally, there's the case of top-level browser windows. Each one of these gets a separate WebRender instance, like @kvark describes. I believe we don't currently share GLContexts across WR instances on Mac, but per [1] I believe we are hoping to in [2].

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1495977#c9 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1491442

dati91 commented 5 years ago

Quick update on Windows/Vulkan: After a (way too) long debug session, I finally found the root cause....

It sounds like your GPU process crashed

Turns out this is the case, but the crash is in Vulkan's validation layer. After I disabled that, everything seems to work fine.

In the VL, it runs into a nullptr for an image_view in a draw call, but I didn't find out why. I checked our image management, and we didn't free any image which was not submitted properly. And after turning it off everything looks fine. So maybe the problem is in the VL?

But its not ready for reftesting just yet, because we are not quite drawing correctly on the second window: gecko_reftest_vulkan @zakorgy already working on a resize fix for macOS/Metal (there is a similar issue there), maybe that will fix this as well.

dati91 commented 5 years ago

We finally managed to run the reftests on Windows with Vulkan with the following results:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 18983 (18954 pass, 29 load only)
REFTEST INFO | Unexpected: 42 (23 unexpected fail, 19 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception
REFTEST INFO | Known problems: 749 (417 known fail, 0 known asserts, 128 random, 204 skipped, 0 slow)

A more verbose result is here.

We should definitely double check these results, but if everything is alright, this looks very promising. But I guess this was expected, because @zakorgy already fixed a lot in our Linux/Vulkan version.

Also I couldn't check with the original WR/gecko, because it doesn't draw correctly, not sure why. Maybe a gecko and/or WR update will solve it and we can compare the two versions.

zakorgy commented 5 years ago

Finally got reftest results on MacOS with Metal:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 17999 (17970 pass, 29 load only)
REFTEST INFO | Unexpected: 1007 (913 unexpected fail, 94 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception
REFTEST INFO | Known problems: 760 (451 known fail, 0 known asserts, 118 random, 191 skipped, 0 slow)

Also we had to skip 5 tests which crash at the moment. The full result is here

The number of failing tests is pretty high, but when I run them one-by-one, in most cases they pass. This problem is probably related the sync issue from https://github.com/szeged/webrender/issues/189 which we still have to figure out.

zakorgy commented 5 years ago

Just a note for the above comment: The 5 crashing test is related to external textures, and https://github.com/szeged/webrender/pull/230 fixes those crashes.

kvark commented 5 years ago

Are you still getting 493 unexpected failures with the original WR? Perhaps, we could have a table in the body of the issue, which is just kept up to date. That would be ideal :)

dati91 commented 5 years ago

@kvark We are currently working on a wr/gfx/gecko update, because our current version is more than a month old. After that we can re-run every platform/backend and have the results updated.

Perhaps, we could have a table in the body of the issue, which is just kept up to date. That would be ideal :)

Of course, we can make that happen.

Are you still getting 493 unexpected failures with the original WR?

I don't think that number is still accurate, after the update we can re-check that as well.

dati91 commented 5 years ago

In the meantime, we managed to run gecko with the DX12 backend. And here are the results:

REFTEST INFO | Result summary:
REFTEST INFO | Successful: 18968 (18939 pass, 29 load only)
REFTEST INFO | Unexpected: 48 (31 unexpected fail, 17 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception
REFTEST INFO | Known problems: 747 (417 known fail, 0 known asserts, 127 random, 203 skipped, 0 slow)

Note: the pipeline creation is noticeably slower than any other backend, but after it's done everything works fine.

dati91 commented 5 years ago

we could have a table in the body of the issue

Done -> https://github.com/szeged/webrender/issues/198#issue-341505766

dati91 commented 5 years ago

An update on what we are up to these days.

Our current focus is to get gecko working on the treeherder. If interested you check out our tries. Gyula's or mine.

Here is a short summary: First the good news: we can build and run tests on Windows/DX12. \o/ (try job) Next one is Windows/Vulkan. We can build it. But when running the tests it crashes. We are not 100% what's causes it, but somehow the brush_blend shader is related, because if we disable it, we managed to get reftest results. (try job (crash), try job (disabled shader)) On Linux/Vulkan we can build, but can't run because the machine doesn't have vulkan installed :/ (try job) On MacOS/Metal it fails at linking. One of our dependency is not correctly linked atm. (try job)

Some notes on the local builds: On mac, we can add the missing obj to the linker and get a working gecko. The result will look weird, because the native dpi is 2, and we are not handling that correctly, so we have to force gecko to stay with 1. After that it looks normal. ~With our current rebase to a working revision, we are hitting an OOM on Windows/Vulkan when creating a popup/second window/new wr instance. We are still investigating what went wrong there. (note: the multiwindow example works fine)~

dati91 commented 5 years ago

A quick update: About the Windows/Vulkan build: We found the problem. We missed a previous fix for the swap_chain, which was not upstreamed to our webrender. Now it works fine.

About the MacOS/Metal build on treeherder: We managed to build it. And it seems the server doesn't have Metal framework for the reftests. (try job) The bot is 10.10, and the Metal API was introduced in 10.11 AFAIK

shmerl commented 4 years ago

What is the current status of this effort? It would be really useful for Firefox to drop OpenGL renderer and to switch to Vulkan, to avoid nasty bugs like this one.

kvark commented 4 years ago

@shmerl work is on-going, e.g. currently the team is trying to rewrite the render passes to minimize switching, see https://github.com/szeged/webrender/pull/325 Gecko is constantly tested, we are trying to reach and surpass the performance metrics of the current WebRender/GL with all backends.

shmerl commented 4 years ago

Are these benchmarks published anywhere? Would be interesting to see the progress.

zakorgy commented 4 years ago

@shmerl We have some results here https://docs.google.com/spreadsheets/d/16vi2KeAPAjKkwW7cGeno4NmI9FxqWGkscnv9D-6cRrQ/edit#gid=248730309 In overall on Windows and Linux we are still behind the GL implementation on Intel. On macOS a few test has lower points but in average we have the same points as with GL.

lnicola commented 4 years ago

Just wondering, have you tried the OpenGL backend of gfx-rs to see if the regressions are caused by the Vulkan implementation or the gfx-rs overhead?

zakorgy commented 4 years ago

@lnicola We haven't tried the OpenGL backend of gfx-rs, when we started the project it was still in a WIP state and our main focus was mostly Vulkan and Metal.

kvark commented 4 years ago

@lnicola fyi, running on gfx-rs GL backend is planned but not in a shape where it would tell you much yet. Moreover, even if it was ready today, it wouldn't give you the answers you are looking for. It would tell you how bad OpenGL matches Vulkan API, not how much overhead gfx-rs itself has.

shmerl commented 4 years ago

In overall on Windows and Linux we are still behind the GL implementation on Intel.

Is this due to how gfx-rs is using Vulkan, or due to anv issues?

kvark commented 4 years ago

This is due to the fact gfx-rs exposes a Vulkan-ish API, and WebRender is getting re-shaped to take advantage of it. Trying to shove it back to the ancient OpenGL is not expected to be as good as the current GL path, it will be purely a fallback in case other APIs aren't available.

shmerl commented 3 years ago

How is the progress of this? Some quite annoying bugs in radeonsi that cause GPU hangs with OpenGL on Navi could really be nice to avoid if gfx-rs with Vulkan was an option for Firefox.

kvark commented 3 years ago

@shmerl unfortunately, nobody has been able to work on this for the last 6 months. We are quite short on resources atm... The "szeged" group (who owns this repository/fork) is no longer contracted by Mozilla for WebRender work. Please reach out to me (on Matrix as "@kvark:matrix.org") if you want to help!

naturallymitchell commented 3 years ago

I would also like to help work on this and find resources to supply. could we possibly chat on a gfx-rs Discord server? I'm naturallymitchell 3561 on the Rust Discord server.

kvark commented 3 years ago

@naturallymitchell would you mind joining us in #gfx:matrix.org ? I tried to look for either gfx-rs Discord, or you on Rust Discord, and failed.

awsdert commented 2 months ago

Firefox runs on gecko right? Is this part of firefox already via some config option or is it still not considered stable enough to be added to config?