Support Multiple render target (MRT)

Zyko0 commented 6 months ago

Operating System

[ ] Windows
[ ] macOS
[ ] Linux
[ ] FreeBSD
[ ] OpenBSD
[ ] Android
[ ] iOS
[ ] Nintendo Switch
[ ] PlayStation 5
[ ] Xbox
[ ] Web Browsers

What feature would you like to be added?

I believe it could be a great addition if ebitengine could support MRT.

At the moment we can write to a single dst image, and pass multiple src image (to a DrawTriangles/DrawTrianglesShader) function, writing to multiple dst images with the same function call (and with the same internal draw call) would be nice for some specific usecases.

On software side, something like:

DrawTrianglesShadersMRT(dst []*ebiten.Image, vertices []ebiten.Vertex, indices []uint16, opts *DrawTrianglesShaderOptions)

On kage side:

func Fragment(dst vec4, src vec2, color vec4) (vec4, vec4, vec4) {
    // Heavy calculations on common (maths, geometry, etc..)
    common := HeavyCalculations(dst, src, someUniforms)
    // 3 destinations textures
    mask0 := Mask0(common)
    colorOut := ColorOut(common)
    dataOut := GetData(common)

    return mask0, colorOut, dataOut
}

Why is this needed?

I've had many usecases where 80% of the heavy computations made within a shader invocation are needed for multiple destination images. Having the possibility to re-use the same vertices, the same shader draw call, the same 80% initial work that is common to all destination images could give new possibilities.

So far, in order to do so, we need:

To write multiple shaders with repeated code, or a single one with uniforms branches
To make distinct calls for each of the destination images (and each of these shaders/new uniforms => no batching), even though we use the same vertices and most of the same shader code (and results of the same calculations) => this obviously forces the re-invocation of a fragment shader for the same pixels 2-3 times, and the computations that come with those

In terms of usecases:

One of the most common use nowadays, is for 3D pipelines where you process some geometry or any kind of maths for meshes / triangles, and you want to write to multiple offscreens at once (https://learnopengl.com/Advanced-Lighting/Deferred-Shading):
- Diffuse (albedo), Normal, Depth, Specular, UVs, etc..
Any scenario that would normally require multiple passes over the same geometry!

If this can be supported, it would certainly unlock new rendering potential for Ebitengine, even for 2D/2.5D workflows I believe. Some existing game rendering pipelines could be optimized on the user side, or improved with new effects (for free almost?) and in general would give a new (advanced) way of designing an (richer) ebitengine application.

Potential hints:

Metal: multiple texture slices, multiple viewports
DirectX doc1, doc2

Proof of concept PR: https://github.com/hajimehoshi/ebiten/pull/2953

hajimehoshi commented 6 months ago

Wouldn't we need a depth buffer or a stencil buffer first, perhaps?

Zyko0 commented 6 months ago

Yes, maybe! Also, there's a case with depth buffer + MRT where you might want to override a custom depth value (https://registry.khronos.org/OpenGL-Refpages/gl4/html/gl_FragDepth.xhtml)

I'm thinking if we want to write to some destination textures, and sometimes discard() writing to some others, based on some runtime conditions (e.g: one want to overwrite the pixel in the depth buffer only if some conditions are met) => this would complicate the MRT feature a bit

Depth buffering would be great indeed, but I have no idea how we would like to support it (especially since the usual depth buffer is a floating-point texture?)

edit:

Wouldn't we need a depth buffer or a stencil buffer first, perhaps?

But either feature (MRT or new buffer types) doesn't require the other, and can still add value individually

hajimehoshi commented 6 months ago

Depth buffering would be great indeed, but I have no idea how we would like to support it (especially since the usual depth buffer is a floating-point texture?)

As Ebitengine is a 2D game engine, supporting a depth buffer sounds a little odd. I'm not familiar with it so this might be perhaps useful even for 2D game engine, but I am not sure.

Zyko0 commented 6 months ago

So it's technically easy to support at the graphic driver level: https://github.com/hajimehoshi/ebiten/pull/2953 is just a minimal working example (for MRT at least) for OpenGL and directx11 (both tested on windows only),

And I think the state of this issue is a API design issue (probably more internal than public), that still needs investigation and discussion on whether it's something we'd like to support (and if so, how), since:

It makes sense to render to multiple targets (and it's only doable) when the different destinations are separate textures (unmanaged) => A fragment is bound to a destination location, so it can write at the same location on multiple textures, but not at different locations from a single texture
Based on the previous point, it is possible that this can't be generalized and therefore involves having a different "draw path" for triangles using MRT => Then it could add a maintenance cost for a not-so-required feature for a 2D engine

hajimehoshi commented 6 months ago

It is possible to make destination images separate from atlases dynamically (and actually Ebitengine does so when necessary), but this would degrade performance, right?

Zyko0 commented 6 months ago

It is possible to make destination images separate from atlases dynamically (and actually Ebitengine does so when necessary)

~~I think this would defeat the purpose a little yes~~, but I wasn't aware of that actually!

Unless, it is stated somehow that: Images passed to this method will be made unmanaged if they are not already, which might prevent them to be batched with different commands

I didn't consider it, but it's true, that in this case it shouldn't even matter to the user (the fact that an image is made unmanaged), and be accepted since the usage of this function would be a bit special by nature. The risk, of being concerned (as a user) by losing the batching-capability of an image used as part of an MRT pipeline should be quite low.

but this would degrade performance, right?

I mentioned "made unmanaged" in order to cover for the performances part, assuming that: once it is made unmanaged by ebitengine, it will never be moved again to an atlas or merged with other atlases. In that case, the cost would only happen once, so it should be okay!

However, if you meant that they can be moved for the sole purpose of ensuring that a draw call can be performed, but that they can be moved back to atlases, then it's not good (we would like this operation to happen once at most).

edit: This would solve the primary issue (and most important one), but then it should also be stated (+panic()) that ebiten.SubImages are not accepted => which is probably okay too!

hajimehoshi commented 6 months ago

I didn't consider it, but it's true, that in this case it shouldn't even matter to the user (the fact that an image is made unmanaged), and be accepted since the usage of this function would be a bit special by nature. The risk, of being concerned (as a user) by losing the batching-capability of an image used as part of an MRT pipeline should be quite low.

I'm not sure I understand what you mean. I assume the destination textures for MRT are used as multiple source textures for one shader draw call, then even if the textures are separate, this should be efficient. Is this correct?

I mentioned "made unmanaged" in order to cover for the performances part, assuming that: once it is made unmanaged by ebitengine, it will never be moved again to an atlas or merged with other atlases. However, if you meant that they can be moved for the sole purpose of ensuring that a draw call can be performed, but that they can be moved back to atlases, then it's not good (we would like this operation to happen once at most).

If an image is unmanaged (NewImageOptions.Unmanaged), right, the image never goes to an atlas. If an image is managed, the image might go to atlas again in some conditions (e.g. the image is used as a source for a while, and the image is not used as destination)

Zyko0 commented 6 months ago

I'm not sure I understand what you mean. I assume the destination textures for MRT are used as multiple source textures for one shader draw call, then even if the textures are separate, this should be efficient. Is this correct?

Yes! (faster than batched triangles multiplied by N regions on a single texture, since it would be a single region here and just N writes from the same shader call)

If an image is unmanaged (NewImageOptions.Unmanaged), right, the image never goes to an atlas. If an image is managed, the image might go to atlas again in some conditions (e.g. the image is used as a source for a while, and the image is not used as destination)

Okay yeah then it's acceptable I think, I understand what you mean. Setting it as unmanaged for more control over the performances should be a user tweak then!

How to handle passing subimages as destinations? I suggest we reject those 👀

hajimehoshi commented 3 months ago

I think we have already discussed in Discord, but what we have reached an agreement is that

A fragment function in Kage will return multiple values
The function for MRT will be a global function rather than a Draw* style method
The function takes multiple destinations, which must be unmanaged images
- A tricky thing in MRT is that all the destination positions are shared among the destinations, so atlases are not available.
The destination images must not be a sub image
- For the same reason above.

Is that correct?

hajimehoshi commented 3 months ago

As we discussed in Discord:

All the destination images must have the same bounds
All the parent images of the destination images must have the same bounds
- A parent image is the image itself if the image is not a sub-image, or the original image if the image is a sub-image
- A parent image might be able to have different size for MRT, but for simplicity, let's have such restriction and revisit this later if we really need.
All the destination images must be unmanaged
- Ebitengine might be able to convert a managed image to unmanaged automatically, but for simplicity, let's have such restriction and revisit this later if we really need.

tinne26 commented 3 months ago

By the way, slightly out of topic, but I have found a use-case for this feature in a 2D game, so I'll share it here:

Say you have this game: https://tinne26.github.io/mipix-examples/gametest/.
If you look closely, some objects can look very slightly disconnected from the ground (ymmv depending on display size, resolution and so on, but even if you don't see it just trust me). This is because rendering is done in a logical canvas for the world (back), then a second pass at high res for the smoothly moving character, and then a final pass with the logical rendering of elements in front of the player. Once everything is scaled, sometimes two contiguous pixels in logical space are not fully opaque in high res due to projection filters, so they are slightly translucid and you see a hole between them that should not really be there.
There are some low-level solutions to this like extending your graphical assets to cover the ground with one extra pixel and so on, but that's kinda annoying. And here's where MRT can be useful: besides drawing the game elements to the main logical canvas, you also keep a "connectivity" logical canvas (so, the second rendering target). Having this "connectivity canvas" allows you to detect which elements are fully adjacent in logical space and use this information when projecting to high resolution without leaving gaps.

Sounds a bit convoluted, but it's a nice and purely 2D use-case. There are decent alternative ways around it in this case, though.

Zyko0 commented 3 months ago

@tinne26 Very cool!! And as a bonus you also render it once, even though the gain might not be massive!

hajimehoshi commented 3 months ago

@tinne26 Hmm? I still don't understand how MRT resolves the gap issue

tinne26 commented 3 months ago

Those disconnected graphics are on the back and front layer respectively, drawn on a logical canvas of 256, 144. The reason they appear disconnected is that I have a separate high resolution draw in the middle, so I need to project the logical canvas first before the high res draw and then do the same for the front layer after the high res draw. One idea to solve this is to use MRT to make the logical draws to 2 canvases, both of size 256x144. One will be used for the regular graphics, and the other will be used to keep track of the connectivity of the elements drawn at logical size. So, on the third draw pass, during the front layer logical draw, I have a clean canvas with the front layer and another that also includes the previous data (what I'm calling the "connectivity canvas"). I can use this connectivity canvas on the {logical => high res} projection to "correct" these gaps (theoretically). There are many different strategies though, both with MRT and without MRT, but MRT seems to make life easier in this case.

In any case, I'm not particularly arguing in favor of MRT or anything, you all know I'm more interested in depth buffers than MRT, but it's still an interesting example of how MRT might have some uses even in 2D. In fact, more uses would come for MRT if we actually had depth buffers too, as isometric games can absolutely use depth information for many things, and if you can draw that at the same time as the main tiles, that's great.

hajimehoshi commented 3 months ago

So if we should do:

Draw the backend layer to the high-res canvas
Draw the frontend layer to the high-res canvas

you meant we can change them with MRT into like these:

Draw the backend/frontend layer to the two low-res canvases at the same time
Draw the two low-res canvases to the high-res canvas

?

If backend and frontend are different very much, would the MRT shader be efficient? Why not using one low-res canvas?

Maybe I don't understand this sentence:

One will be used for the regular graphics, and the other will be used to keep track of the connectivity of the elements drawn at logical size.

hajimehoshi commented 3 months ago

OK so I missed the middle layer, but I still don't understand what and how MRT resolves. Please list draw calls before and after MRT, thanks!

hajimehoshi commented 3 months ago

@Zyko0 By the way, how much would the performance be imporved by your experimental PR?

Zyko0 commented 3 months ago

I paused my side project actually to focus on this, also not knowing if this feature would get accepted or not originally. So I haven't tested yet, but I'm excited to, it's just mean quite a big refactor, so I didn't try yet, but I can try later if you want!

I also paused it because replicating draw calls and the same costly operations wasn't sustainable for new effects I wanted to add. So I decided to stop at an arbitrary number of features.

These features require tracing/image information (that could come for free with MRT), but just impossible to add to the current load, so I've not implemented those yet.

Zyko0 commented 3 months ago

By the way, how much would the performance be imporved by your experimental PR?

@hajimehoshi I made 2 frame captures using RenderDoc to see the differences between the current implementation and the mrt one:

The difference is that the EID=88 (0.6ms) call from first screenshot is not necessary anymore in the new version.

Current - no MRT (2.46ms?): qrenderdoc_KOevHcNHAk

With MRT (1.95ms?) (single tracing, multiple outputs + a deferred rendering merging pass): qrenderdoc_tMKuH7Vwdb

hajimehoshi commented 2 months ago

I'm happy that there seems improvement in MRT!

By the way, I was wondering if there are other potential users or other use cases than @Zyko0 . As this would be a pretty big change and a big burden of maintenance, I'd like to know those.

hajimehoshi / ebiten