servo / webrender

A GPU-based renderer for the web
https://doc.servo.org/webrender/
Mozilla Public License 2.0
3.11k stars 276 forks source link

https://www.w3.org/conf/2013sf/ uses too much GPU time #1817

Closed mstange closed 6 years ago

mstange commented 6 years ago

With the integrated GPU on my Macbook Pro, https://www.w3.org/conf/2013sf/ uses about 12ms of GPU time. I exported a frame as a YAML + resources zip: http://tests.themasta.com/w3.zip

Here's a screenshot of the webrender profiler on cargo run --release -- -p 1 show w3/w3.yaml:

screen shot 2017-10-05 at 4 33 17 pm
glennw commented 6 years ago

There are a lot of clips and blurs occurring in that profile - we'll need to investigate why (or why they are slow).

This is also (probably) a good example of how in the Gecko integration we're not currently getting any z-buffer benefits - most pixels on here are opaque, so we'll probably benefit a lot from early z-reject once this is fixed in WR.

mstange commented 6 years ago

Adjusting the text shadow blur radius to match Gecko might also improve this slightly.

mstange commented 6 years ago

The inner rect workaround from #1820 regressed this testcase by about 50%: The GPU time on my machine with the integrated GPU went from 10.91ms to 16.11ms.

kvark commented 6 years ago

Thanks @mstange . Time to revisit my proposal of #1828 to address this properly?

mstange commented 6 years ago

The shadow changes increased this to 340ms.

mstange commented 6 years ago

With the blur optimizations from #1896, it goes down to 60ms.

glennw commented 6 years ago

Wow! I'll take a look at this today, and write up any findings here.

metajack commented 6 years ago

cc @metajack

glennw commented 6 years ago

I logged out all the blur render tasks that are added for this test case:

new_blur r=24 t=Alpha s=1756×13711
new_blur r=4 t=Color s=810×74
new_blur r=4 t=Color s=865×74
new_blur r=4 t=Color s=602×74
new_blur r=4 t=Color s=280×74
new_blur r=4 t=Color s=224×74
new_blur r=4 t=Color s=198×74
new_blur r=4 t=Color s=207×74
new_blur r=4 t=Color s=255×74
new_blur r=4 t=Color s=216×74
new_blur r=4 t=Color s=219×74
new_blur r=4 t=Color s=222×74
new_blur r=4 t=Color s=255×74
new_blur r=4 t=Color s=338×74
new_blur r=4 t=Color s=314×74
new_blur r=4 t=Color s=224×74
new_blur r=4 t=Color s=234×74
new_blur r=4 t=Color s=230×74
new_blur r=4 t=Color s=274×74
new_blur r=4 t=Color s=227×74
new_blur r=4 t=Color s=219×74
new_blur r=4 t=Color s=272×74
new_blur r=4 t=Color s=761×168
new_blur r=6 t=Alpha s=946×120
new_blur r=4 t=Color s=161×74
new_blur r=4 t=Color s=108×74
new_blur r=4 t=Color s=157×74
new_blur r=4 t=Color s=148×74
new_blur r=4 t=Color s=122×74

Wow, I think new_blur r=24 t=Alpha s=1756×13711 might be the problem (and a bug)!!

Investigating now... :)

glennw commented 6 years ago

Perhaps the more surprising thing is that it manages to do a blur with a radius of 24 at that resolution in "only" 60 ms...!

glennw commented 6 years ago

Which due to the way passes allocate an array texture results in allocating 3x 2048x13711 alpha targets, which is 84 MB of alpha frame buffer to clear, in addition to the blurs...

glennw commented 6 years ago

Ah, this page has a huge box shadow applied to the entire body element, which is very long due to scrolling. WR is not clipping this to the visible screen when rendering an off-screen target. I'll work on a fix for this today, which should have a massive improvement in GPU time for this test case.

glennw commented 6 years ago

https://github.com/servo/webrender/pull/1954 fixes the box shadow time on this page. However, something has changed in WR / Gecko / page content which results in a ridiculous number of clip masks being allocated (26 x 2048 x 2048 A8 pages!), so that will need further investigation.

staktrace commented 6 years ago

However, something has changed in WR / Gecko / page content which results in a ridiculous number of clip masks being allocated

These clip masks you're referring to, are they clip items with non-empty masks? i.e. SpecificDisplayItem::Clip(ClipDisplayItem { _, Some(_) })?

@jrmuizel mentioned this potential regression during today's daily standup and I wanted to check that it wasn't an accidental regression from bug 1405359 or bug 1409446, but when I load the page at https://www.w3.org/conf/2013sf/ it doesn't seem like gecko is defining any clips on the page that have nonempty masks. (I checked by turning on the WebRenderAPI.cpp logging for the content process).

glennw commented 6 years ago

It's not totally clear where they are coming from yet, but I do see a huge number of full screen clip masks being generated. There's often up to 30 or so 2048x2048 targets being allocated, which is clearly a bug. I'll be investigating this further today or tomorrow.

glennw commented 6 years ago

With #1960 and #1954, this is starting to look reasonable. It runs at about 10ms GPU time on my HD4600.

It's unfortunately then badly CPU bound in DL (de)serialization time. I'm not sure what the cause of that is, but it seems particularly bad.

In terms of GPU time, I'm fairly happy with the 10ms right now, because our GPU time baseline appears to be around 6ms when drawing basically nothing except the main UI and a background. This is partially due to ongoing work (reducing redundant clip masks and getting the z-buffer working in Gecko). Once those are sorted, I expect the baseline to drop quite significantly.

glennw commented 6 years ago

In addition to that, we should do some profiling and investigation of why the baseline is that high, when rendering basically nothing. It should be significantly less than that, even with the known clip and z-buffer issues.

mstange commented 6 years ago

It's unfortunately then badly CPU bound in DL (de)serialization time.

Are you testing this in a local Firefox build? If you are, please be aware of the recent change to the default rust optimization level from 2 to 1.

glennw commented 6 years ago

Oh, I was not aware of that change. Thank you for the heads up! I'll profile again today with an optimized build!

glennw commented 6 years ago

I ran this page again with the rustc optimization level fixed. The CPU backend and GPU times are reasonable (not great), but the DL build time is still awful - it often spikes up to 50ms of DL build time.

It's possible that it's a measurement error, but we should investigate this further. Is it possible Gecko is taking that long to build the DL in this case?

jrmuizel commented 6 years ago

I think some of the DL times are not that meaningful in Gecko.

nical commented 6 years ago

This page now renders quite fast in webrender.