Open kvark opened 6 years ago
Interesting idea! I wonder if the anti-aliasing would be affected in a bad way though by doing this?
Another thing to consider for this test case - we don't currently share these clip masks at all, even if they were the same. I wonder if we were to quantize the parameters to device pixel amounts, and share them, if we'd drop the number of masks quite significantly here?
Another possibility would be to recognize the special case of a solid color with a rounded clip and change the clip into a plain old border display item.
Ideally, to make this optimization better for real-world use cases (i.e. not just tailored to this benchmark) it might be nice to do it after primitive segmentation. This would allow it to, for example, eliminate the clips from large rounded flat colored buttons, which are common in flat design.
That's an interesting idea - but there's an even better solution I think - extend the mask shader (used above) to support a vertex color. Then, in this case we just draw a colored mask directly onto the surface and skip intermediate surfaces altogether.
After talking to @glennw I admit we can't have local clip masks (because of AA), and the last proposal (using clip_rectangle for rendering into the color framebuffer) seems reasonable hack :)
Clips cause similar problems in the bouncing gradient circles:
@pcwalton I think (yet to be confirmed) that the test case above is also drawing a heap of redundant rectangular clip masks, which might explain part of the problem too.
Note that if the clip mask is a circle it can be reused regardless of rotation, because it's, well, a circle. Optimizing this might help that benchmark (but might not help real-world sites).
Also, I checked to see whether the problem was the ellipse shader by commenting it out. This improved performance by 50% or so. So while optimizing the ellipse shader would certainly help, it's far too slow for the ellipse border shader to fully explain the issue. It might be needless rectangular clip mask generation, as you suggested.
I'll take a look at this test case using the fix mentioned here https://github.com/servo/webrender/issues/1648#issuecomment-346511425 and investigate from there.
OK, this test looks fairly reasonable with the removal of the redundant rectangular clip masks. Could still do with improvement but it's not bad now.
The GPU usages sits at around 4-8 ms for me when just drawing the required clips.
I just retested. With 1000 circles I now get ~30 FPS in Servo+WR+slow style hack and 37 FPS in Chrome. Most of the GPU time is spent in clips, unsurprisingly. CPU time outweighs GPU time.
Ways we could make up the remaining difference:
I would guess that doing any one of these will make up the difference on its own.
@pcwalton Is that with https://github.com/servo/webrender/pull/2104 applied (it's not in Servo yet)? It might not make any difference here, but it may be quite significant. What CPU/GPU times are you seeing in WR?
@glennw Yes, it's with #2104 applied.
@pcwalton Cool, those numbers look reasonable-ish for that test. I'll do some tests on having a fast clip path for uniform radii, that's probably the next easy win there.
MotionMark first test features hundreds of clipped rectangles rotated differently (also see #2083). Each produces a separate clip mask to be rendered, which is inefficient.
It would be best to recognize the transformed property here and generate a single mask in local space that is shared between all the instances. This would require the transformed shaders to be aware of the space the mask is provided in.