Closed bjornbytes closed 2 years ago
I'm looking at Vulkan and WebGPU as APIs to target. Both of these have some features in common:
Inside a command buffer, you can "enter" a render pass, bind a pipeline, bind buffers, and then submit draws.
Right now I'm most interested in the pipeline objects, and I'm still a bit confused about render pass objects (gonna talk about those later).
Pipelines store a lot of state, approximately:
In an ideal world, you're supposed to create all the pipelines you want to use upfront and then switch between them at runtime. For LÖVE/LÖVR, we have a highly scriptable immediate-mode API right now, so it isn't really feasible to have lovers specify all of the rendering details upfront. It seems like most applications in this situation do "last-minute" pipeline resolves at draw time, with plenty of hashing and caching to keep things fast (this blog post outlines this a bit). I'm aiming to use this technique, at least at first to keep things flexible.
I'm planning on making the following API changes:
Shader
: Mostly the same I think. May need to start using uniform buffers more internally.Mesh
: To make meshes map better onto the Pipeline, I think they should only hold the vertex/buffer format info, and then point at some sort of generic Buffer object.
Pipeline
Add some sort of object for holding the current global fixed-function state.
Pipeline
internally, which is a little confusing because it is different from the monolithic GPU pipeline, so might need to rename that. But I think Pipeline works okay for a Lua-facing name.Sampler
s? No idea yet. We're supposed to use these but they're so tedious.Canvas
Batch
Sorry if this is all still a bit disorganized, still learning and organizing thoughts!
Threading: Vulkan drivers aren't threaded like OpenGL drivers are, leaving it up to the application. I can think of two different ways of taking better advantage of multithreaded rendering:
Thread
and Channel
APIs. This would mean the default path (one function for the draw callback) is "slow", but gives people more control over how they split up and optimize their rendering code.Here are some of my own thoughts / where my head is at right now. A lot of it matches up well with your notes, I think.
All drawing and GPU state-setting functions will be removed from the global love.graphics API.
New concept & object: Render Pass. It contains some setup info:
RenderPass
will render to. (Side note: a Canvas is just a texture that's tagged saying it can be rendered to using a RenderPass
).a RenderPass
has methods to queue state-setting and drawing commands, which will only be executed when a new function love.graphics.execute(renderpass)
is called (naming TBD).
This is pretty much the same concept as vulkan / metal render passes, just at a higher level of abstraction.
Because commands are enqueued instead of executed immediately, it has some pretty big implications for the use of other love.graphics state and data objects – for example if you set the vertex positions on a mesh, queue a draw command for that mesh, and then change the mesh vertex data again before executing the render pass, the draw operation would only reflect the latest changes done after enqueuing the draw command, not before.
There are some things that users will probably want to repeatedly change within a render pass, which is currently only set-able via other objects. The main thing I'm thinking of is shader uniforms. My thought right now is to keep the existing APIs for setting uniforms on a Shader object (or have a very similar version), and add new methods to Render Pass objects to set a shader's uniform which only lasts for the duration of the render pass (or until it's overwritten within that render pass). I haven't ironed out exact specifics of that, though.
I also really like the idea of stackable pipeline state objects that can be applied to a RenderPass
. In my head I've been calling them Graphics State objects rather than pipelines. Perhaps local uniforms could be set there as well.
I also want Buffers exposed more generally. A Mesh's buffers could be attached to other meshes, instead of attaching meshes themselves together.
I'm thinking textures will still have their own sampler object instead of completely splitting them, but maybe there can be a RenderPass
method to make a texture use different sampler state for the duration of the render pass (or until it's overwritten in that render pass).
Command buffers might not need to be exposed as an external API concept. My thought is that the love.graphics functionality outside of render passes (such as executing a compute shader or copying vertex data) is "immediate" and the implementation uses command buffers as appropriate under the hood. Maybe a single vulkan command buffer can be used for multiple love.graphics.execute
calls if they're small, or something. That said, I haven't thought about async compute using a separate compute queue. That's probably overkill for now, anyway.
Since running a compute shader would happen outside a render pass, it wouldn't have a way to set one-off short-lived uniform data (without using Shader:send or whatever), but I think it should. I'll have to think about that.
For making use of a low level graphics API's internal prerecording capabilities, maybe a render pass could either have a method to "compile" it for reuse, or a flag on creation or something.
I'm slowly starting to come around to the "render pass" object idea. I was originally uneasy because it doesn't map directly onto one of the GPU concepts, but I realized that it's a really approachable/convenient way to structure a game.
The thing that helped it click for me was thinking about them as "layers". Like how in GDC postmortems or rendering breakdowns, they always present each layer (pass) of the frame individually and layer them on top of each other to get the final result. So for a simple game you might have your static terrain/tile layer, a layer for characters/enemies, and a layer for the UI. Usually each of these are separate render passes with their own state/objects, and so if the LÖV API presented something that let people express that, it would be a pretty big win. Even if it doesn't map directly onto a pipeline/renderpass, it still makes it way easier for the underlying LÖV implementation to do so.
I'm not sure but it seems like your RenderPass is going to store a list of commands in memory, and then serialize them to the command buffer/encoder at the time of execute. I'm going to try to do something different and record the commands to the API directly, so that I don't need to store additional memory and reduce overhead a bit. It seems like there are several reasons why this won't work, but I'm still going to try.
Somewhat related -- it could be confusing to have object state only used when passes are executed. The (current) alternative is to sprinkle flushes all over the place, which makes the API nicer but the implementation more annoying. I'm curious if that's still possible in the render pass setup or if it would be prohibitively expensive/complicated. Hmm.
After more research I understand why prerecorded command buffers might not be as necessary as I thought -- modern APIs are way faster at enqueueing draw calls than OpenGL, so it isn't a big deal to do that over and over again. There are still 2 reasons I might be interested in it A) reducing Lua-C overhead of enqueueing large numbers of unchanging draws B) optimizations that can be done when the set of draws are known (culling, sorting, more relevant for 3D, but I kinda lean towards pushing this to Lua anyway since it's so app-specific).
1 sampler per texture seems like a good approach. It can't really be worse than whatever is going on in OpenGL today. It looks like Vulkan drivers still do caching of samplers anyway. Maybe the lov.graphics default filter can be a global "cached" sampler.
I'm trying to get other work out of the way so I can focus more on implementing this stuff!
Finally started laying the groundwork for this on a branch if you're interested in lurking:
https://github.com/bjornbytes/lovr/compare/gpu
Really just Vulkan boilerplate at this point.
Finally worked up the masochism to start working on this stuff again.
Implemented this API for Texture views
lov.graphics.newTexture(texture, TextureType, firstLayer, layerCount, firstMip, mipCount)
The layer/mipmap stuff is optional. Could also make it a newTextureView
function instead of further complicating newTexture.
I haven't tried using it yet but it may end up feeling nicer than passing around { texture, layer, mipmap }
tables for texture attachments. It matches the modern APIs better and allows for more powerful stuff (texture type reinterpretation, maybe depth/stencil view stuff or swizzling in the future?).
EDIT: Also added Texture:newView(type, layer, count, level, count)
.
The pass / command buffer API I'm going to try out is a lov.graphics.render
function with two variants. The first one is:
lov.graphics.render(target, function() end)
target
is the usual setCanvas table describing the attachments, load/store ops, etc.
This one is like Canvas:renderTo
. It begins a (cached) render pass, calls the callback containing regular lov.graphics draw calls, and finishes the pass. Any graphics state/bindings set in the callback is temporary to the callback.
The second variant is for multithreading
lov.graphics.render(target, ...batchnames)
You pass in names of prerecorded batches you want to replay. Batches are (secondary) command buffers that can be recorded concurrently. There is a lov.graphics.record
function for this:
lov.graphics.record(target, 'nickname', function() end)
You pass in the target you're going to replay on (sadly this is needed for vulkan/webgpu), a name to use for later replays, and a callback similar to the first variant. The batches are temporary and can only be submitted in the same frame they're recorded. The names are used instead of regular userdata to make it easier to use them between threads, avoid GC, and because there's just not a lot of benefit to retaining them since they're temporary.
(I want to explore more persistent batch objects later, but those are wayyy more challenging. They'd at least need to refcount all resources they use and potentially keep around copies of all the temporary matrices/uniforms).
One thing I like about this is that there are less breaking changes to the graphics module. A lot of code that is just setting state and drawing primitives/Drawables in lov.draw will continue to work. That wasn't the case when I was considering Batch/Pass objects.
I decided against the in-memory representation for the command buffers, at least for now. It has some benefits (you can sort/cull/reorder the draws, inspect/serialize the commands), but I really like the low-level approach where your graphics functions in Lua immediately hit GPU command buffers.
One kind of cool thing is that boot.lua
can do lov.graphics.render(windowTarget, lov.draw)
. It might end up being more complicated than that if people want to do do other passes in the draw callback or submit batches instead. Maybe just a conf.lua flag though.
I'll report back on how it goes, I have to reorganize a bunch of command buffer/pass/framebuffer stuff first, may run into issues.
EDIT: Mostly dropped this due to design flaws. It almost worked, but in the end wrapping it in a Canvas
/ Pass
object is preferable because you can be recording multiple passes at once and it avoids some clashes with global state. It's also just more lovely. So I am fully on board with Pass objects even though I was somewhat against them at first. I still have a function lovr.graphics.renderTo(textures|canvastable, function(canvas) end)
for doing temporary render passes.
Added depth bias and depth clamp states. Not really anything special.
Considering making blend modes and color masks per-target instead of global.
Current idea is for setBlendMode
and setColorMask
to take an optional target index, and if it's missing it applies to all targets (backwards compatible)
lov.graphics.setBlendMode('add') -- applies to all targets
lov.graphics.setBlendMode(1, 'add') -- only applies to first target
I'm not sure how the getters should work. They could either take an optional target index that defaults to 1, or they could return everything if the target is missing. It might be weird to have getColorMask() return 16 booleans...
Here are the 3 types of buffers now (somehow they ended up matching opengl's roughly)
dynamic/transient are double buffered, and can not have storage
usage
I can't imagine metal needs to worry about any of this...
EDIT: Actually dropped the 3 buffer types thing. Instead I'm using usage flags to detect what type of buffer memory to use (write
flag says whether you want to write to it from CPU, transient
(TBD) flag says whether it's okay to discard contents at the beginning of a frame).
Considering only having 3 draw modes for the 'raw' drawing functionality like Mesh: points
, lines
, triangles
. This seems to be more in-line with how D3D12/Metal do things. There will still be a primitive called line
that will draw a line strip, but internally it will just use the lines
draw mode plus an index buffer (unsure of performance caveats here).
Mm I guess love doesn't need to worry about lines as much since they're already polylines.
Here is my API for queries:
lovr.graphics.newTally(type, count)
time
counts elapsed nanosecondspixel
counts visible pixels (occlusion query)shader
counts pipeline statistics (vertex count, vertex/fragment shader invocations, (un)clipped primitives)Pass:tick(tally, index)
and Pass:tock(tally, index)
begin and end a query in the tally (might rename)Pass:read(tally, index, count)
returns a Readback with the query resultsPass:copy(tally, buffer, srcindex, dstoffset, count)
copies tally results to a bufferI just merged my Vulkan branch, so I won't be in brainstorming/API design mode as much. This issue was fun to have as a diary
High level notes/findings on what a modern LÖVEly graphics API could look like.