nical commented 6 years ago

A few thoughts for the far future.

I was thinking about bug 1455921 and other slowness related to having a gazillion visible glyphs and I think that we should be able greatly reduce the CPU load in the long term.

We sometimes have one very big stacking context with a large body of text or this enormous amount of rects in the bug I linked, and we spend a lot of time deserializing the displaylist and worse, generating batches every frames on the CPU.

Clustering

We could benefit from dividing the primitives of a stacking context into fixed-size clusters and using the bounds of these clusters for a coarse granularity culling test. For example we could use a regular grid of 512x512 layout pixel clusters, each primitive is assigned a cluster and the bounding box of each cluster is the union of all of the primitive rects in the cluster (we use the 512 regular grid to quickly decide which cluster to assign but culling would use the actual bounding box). I am sure we can come up with better clustering schemes but this is just an example to get the idea.

This alone should help with the expensive culling phase in frame building quite a bit and would provide a natural boundary for parallelism.

Incremental displaylist

Pushing this one step further we could do the clustering during displaylist building, instead of scene building. This would provide use with a nice granularity for incremental displaylist updates, where we could swap out clusters instead of working at the display item granularity (probably too small) or stacking context granularity (usually too big I think).

I'm intentionally vague about how this would affect the API for now. I have a few ideas but I'd like to first get a sense of whether there would be a consensus about evolving in this direction.

If we do agree, we can at least try to operate with this transition in mind and avoid things that would make it harder to get there.

Also it's possible that instead of using the "stacking context" terminology, I should speak in terms of "group of primitives under the same coordinate space/transform" (that's how I think about it anyway).

Thoughts?

gw3583 commented 6 years ago

I need to read this more carefully, but from a quick skim this sounds like it lines up very closely with what I've been thinking about too. The reasoning I've been thinking about this is that this information could feed into the ideas we have for drawing the main scene in tiles and caching them for scrolling, if we decide that makes sense in some cases for power saving / very expensive pages.

nical commented 6 years ago

There are plenty of useful properties from partitioning in local space the things that "move together". Caching in tiles is one of them, and while I think culling can be done independently, I also think that they line up and that there's some common gymnastics. Pushing the partitioning into the displaylist representation itself is another independent win although all three things should work well together especially if the partitioning is the same.

There's certainly other benefits from architecting things this way (browsers have been doing that for many years for a reason).

Caching in tiles does bring a few extra twists (it sort of implies primitives that touch several tiles are rendered several times at least in some cases), but that would also work well with the clustering idea for culling (instead of growing the cluster's bounds we split the primitive just like we used to do with the old tiling scheme but in local space instead of screen space.

kvark commented 6 years ago

One of the problems with tiling was limiting the rendering area to the tile boundaries. From what I read here, clustering wouldn't need it - it's just a way to split a bunch of primitives into groups with tighter local bounds, so that we can save some time processing them.

servo / webrender

Clustered culling and displaylist updates #2983

Clustering

Incremental displaylist