Improved architecture for `iced_wgpu` and `iced_tiny_skia`

This PR completely rewrites the internals of both iced_wgpu and iced_tiny_skia—all while removing a bunch of indirection, simplifying the layering logic, and introducing smarter caching.

The Backend traits in iced_graphics have been removed, together with the Primitive enum. Instead, renderers simply implement the Renderer traits from iced_core directly. Layering logic is reused by leveraging the new Layer trait and layer::Stack data structure in iced_graphics.

Therefore—instead of recording a primitive tree first and then flattening it as different layers—renderers now are capable of organizing graphical primitives in different layers while these primitives themselves are being recorded (during Widget::draw). Allocations are easily reused this way and, since there is no recursive data structure, we can avoid boxing altogether.

Furthermore, the explicit layer-oriented architecture—as opposed to a primitive-oriented one—allows us to implement smarter caching strategies. Specifically, every canvas::Cache is now capable of caching—not only the tessellated vertices—but also the GPU vertex buffers and texture atlases directly. This effectively means we can avoid preparing and uploading vertex data every frame—specially useful if you are rendering a lot of text in a Canvas!

I took these changes as an opportunity to start benchmarking parts of the library and created a basic wgpu benchmark for now. It just draws 1000 rectangles and 1000 text sections offscreen using a Canvas. When compared with master, we can observe quite the speedup:

Finally, these changes also introduce a new Engine type in iced_wgpu that contains all the pipeline state (as well as the MSAA framebuffer). This type is shared between all the different Renderer instances; reducing the memory usage when using the multi-window feature considerably.

iced-rs / iced

Improved architecture for `iced_wgpu` and `iced_tiny_skia` #2382