Open raphlinus opened 2 years ago
This is a very detailed and accurate description of the design we discussed. One thing that we didn’t touch on is fit/fill scale modes which will require computing the bounding box of the path on the CPU. I believe this is also true if we’re going to handle extend modes in the atlas stage.
This is a followup to #38 capturing our current thinking of how image resources should be managed.
CPU-side image resource type and creation API
The CPU-side image resource is basically just an
Arc<[u8]>
. It also contains a globally unique (atomic counter) id. Creation of an image resource requires no context or factory, and the object is easilySend
etc. The image bytes are provided byimpl Into<Arc<[u8]>>
which I believe is sufficiently ergonomic, it's satisfied by&[u8]
andVec<u8>
, as well as Cow.Possibly we wrap the whole thing in another
Arc
to make the reference smaller, but that probably doesn't matter, it's cheap to clone either way.Encoding into scene fragment
Adding an image draw object to a scene fragment similarly simple, it's basically appending a clone of the reference and an affine transformation (possibly other stuff like extend mode, not part of the current imaging model but worth considering).
Staging
Resource creation and encoding are simple. Staging to GPU is where it gets hard. The algorithm in this section is run every time an encoded scene is to be rendered.
Atlas vs descriptor array vs bindless
All three strategies are viable. Descriptor arrays would reduce copying of pixels but possibly incur overhead for managing descriptor sets, at the cost of poorer compatibility (as a data point, VK_EXT_descriptor_indexing is available on Pixel 6 Pro but not Pixel 4). A bindless approach goes even farther, requiring recent GPU but reducing the cost of managing descriptor sets.
For the first implementation, we're going atlas only, for maximum compatibility and because it has some other desirable properties. The atlas contains scaled (more generally, affine transformed) images. Further, to run the pipeline, the atlas must contain all images inside the viewport. This can potentially fail, as the maximum dimensions of an atlas are exceeded, so see #175 for a discussion of spatial subdivision. In constrained cases, the atlas contains the scaled image clipped to the current viewport (so subdivision is basically guaranteed to reduce atlas requirements), but in relaxed cases it may be desirable not to clip, so that, for example, scrolling can happen without addition re-scaling.
Detailed staging algorithm
The renderer state consists of:
The first step of staging is to allocate all (id, transform) pairs that appear in the encoded image, inside the viewport, to rectangles in the atlas. Note that this requires affine transforms (motivating moving that from GPU to CPU, doing a bit of work on the "append fragment with transform" method on scene fragments).
Iterate all (id, transform) pairs in the encoded scene and resolve to a rectangle. On miss, attempt to allocate the rectangle in the atlas (perhaps using etagere or a custom rectangle allocator). If that fails, blow away the entire atlas mapping and start again. If the atlas contained any mappings that weren't present in the current scene, then it's possible that retry with the same size will succeed, otherwise not so the atlas must be resized. And if that fails, fail to spatial subdivision.
At this point there is a list of new (id, transform) to rectangle mappings, and also every (id, transform) pair represented in the encoded scene has a mapping in the atlas. The current task is to fill those rectangles with scaled images. Generally this involves blit and draw calls added to a command buffer.
For each new mapping, first materialize the GPU image. Look up the id in the cache. On miss, try to allocate space in the staging buffer. If allocation fails, flush the command buffer, fence (waiting for the staging buffer to become writable again). If the staging buffer is smaller than the source image bytes, reallocate the staging buffer. At this point, it is possible to write image bytes into the staging buffer, so map that, copy from the
Arc<[u8]>
, allocate a GPU image, and record a blit command to copy from the staging buffer to the GPU image.Further logic in cache for eviction: if the GPU image being evicted is represented in any pending draw call in the command buffer, then flush the command buffer. This state may also be used to prioritize evicting images not in the pending set.
At this point (cache hit or creation of new GPU image) we have a GPU image for the id, and we have a rectangle in the atlas. Record a draw call (adding the id to the pending set). Note that this draw call requires building out enough rasterization capability in the HAL to do textured quads.
For each (id, transform) image instance in the CPU-side scene fragment, record the corresponding atlas rectangle in the encoding to be uploaded to GPU.
Double buffer staging buffer?
It's likely we'll want two GPU buffers rather than one, so the CPU can be copying bytes and recording draw calls while the GPU is executing blits and draws. But this is a slightly unclear tradeoff, as it might mean more frequent flushes.
Extension modes
Mirror, repeat, etc., in the general case require handling in the draw calls, storing the result of that in the atlas. In special cases (axis aligned where the bounds of the image align to integers) it might be possible to store only one instance and move the extension logic into fine rasterization.
Of course in the future when descriptor arrays or bindless are available, then at least in some cases fine rasterization will sample from the image rather than a rectangle in the atlas.
Fine rasterization
For the most part, we can use the same
Image
command as now, which does imageLoad from the atlas. One potential refinement is to only load the image texture when the alpha mask is nonzero. Currently we issue a texture load for all pixels in the tile. It's possible there is overhead from predication, but I suspect that reducing memory bandwidth for texture fetches of unused pixels will be worth it.Glyphs
We'll have a separate issue for cached glyphs when we get to those, but much of the logic is similar. The glyph atlas must contain all glyphs needed to render the viewport (post spatial subdivision), and the staging process has a similar flavor.