hajimehoshi / ebiten

Ebitengine - A dead simple 2D game engine for Go
https://ebitengine.org
Apache License 2.0
10.87k stars 651 forks source link

Support Multiple render target (MRT) #2930

Open Zyko0 opened 6 months ago

Zyko0 commented 6 months ago

Operating System

What feature would you like to be added?

I believe it could be a great addition if ebitengine could support MRT.

At the moment we can write to a single dst image, and pass multiple src image (to a DrawTriangles/DrawTrianglesShader) function, writing to multiple dst images with the same function call (and with the same internal draw call) would be nice for some specific usecases.

On software side, something like:

DrawTrianglesShadersMRT(dst []*ebiten.Image, vertices []ebiten.Vertex, indices []uint16, opts *DrawTrianglesShaderOptions)

On kage side:

func Fragment(dst vec4, src vec2, color vec4) (vec4, vec4, vec4) {
    // Heavy calculations on common (maths, geometry, etc..)
    common := HeavyCalculations(dst, src, someUniforms)
    // 3 destinations textures
    mask0 := Mask0(common)
    colorOut := ColorOut(common)
    dataOut := GetData(common)

    return mask0, colorOut, dataOut
}

Why is this needed?

I've had many usecases where 80% of the heavy computations made within a shader invocation are needed for multiple destination images. Having the possibility to re-use the same vertices, the same shader draw call, the same 80% initial work that is common to all destination images could give new possibilities.

So far, in order to do so, we need:

In terms of usecases:

If this can be supported, it would certainly unlock new rendering potential for Ebitengine, even for 2D/2.5D workflows I believe. Some existing game rendering pipelines could be optimized on the user side, or improved with new effects (for free almost?) and in general would give a new (advanced) way of designing an (richer) ebitengine application.

Potential hints:

Proof of concept PR: https://github.com/hajimehoshi/ebiten/pull/2953

hajimehoshi commented 6 months ago

Wouldn't we need a depth buffer or a stencil buffer first, perhaps?

Zyko0 commented 6 months ago

Yes, maybe! Also, there's a case with depth buffer + MRT where you might want to override a custom depth value (https://registry.khronos.org/OpenGL-Refpages/gl4/html/gl_FragDepth.xhtml)

I'm thinking if we want to write to some destination textures, and sometimes discard() writing to some others, based on some runtime conditions (e.g: one want to overwrite the pixel in the depth buffer only if some conditions are met) => this would complicate the MRT feature a bit

Depth buffering would be great indeed, but I have no idea how we would like to support it (especially since the usual depth buffer is a floating-point texture?)

edit:

Wouldn't we need a depth buffer or a stencil buffer first, perhaps?

But either feature (MRT or new buffer types) doesn't require the other, and can still add value individually

hajimehoshi commented 6 months ago

Depth buffering would be great indeed, but I have no idea how we would like to support it (especially since the usual depth buffer is a floating-point texture?)

As Ebitengine is a 2D game engine, supporting a depth buffer sounds a little odd. I'm not familiar with it so this might be perhaps useful even for 2D game engine, but I am not sure.

Zyko0 commented 6 months ago

So it's technically easy to support at the graphic driver level: https://github.com/hajimehoshi/ebiten/pull/2953 is just a minimal working example (for MRT at least) for OpenGL and directx11 (both tested on windows only),

And I think the state of this issue is a API design issue (probably more internal than public), that still needs investigation and discussion on whether it's something we'd like to support (and if so, how), since:

hajimehoshi commented 6 months ago

It is possible to make destination images separate from atlases dynamically (and actually Ebitengine does so when necessary), but this would degrade performance, right?

Zyko0 commented 6 months ago

It is possible to make destination images separate from atlases dynamically (and actually Ebitengine does so when necessary)

I think this would defeat the purpose a little yes, but I wasn't aware of that actually!

Unless, it is stated somehow that: Images passed to this method will be made unmanaged if they are not already, which might prevent them to be batched with different commands

I didn't consider it, but it's true, that in this case it shouldn't even matter to the user (the fact that an image is made unmanaged), and be accepted since the usage of this function would be a bit special by nature. The risk, of being concerned (as a user) by losing the batching-capability of an image used as part of an MRT pipeline should be quite low.

but this would degrade performance, right?

I mentioned "made unmanaged" in order to cover for the performances part, assuming that: once it is made unmanaged by ebitengine, it will never be moved again to an atlas or merged with other atlases. In that case, the cost would only happen once, so it should be okay!

However, if you meant that they can be moved for the sole purpose of ensuring that a draw call can be performed, but that they can be moved back to atlases, then it's not good (we would like this operation to happen once at most).

edit: This would solve the primary issue (and most important one), but then it should also be stated (+panic()) that ebiten.SubImages are not accepted => which is probably okay too!

hajimehoshi commented 6 months ago

I didn't consider it, but it's true, that in this case it shouldn't even matter to the user (the fact that an image is made unmanaged), and be accepted since the usage of this function would be a bit special by nature. The risk, of being concerned (as a user) by losing the batching-capability of an image used as part of an MRT pipeline should be quite low.

I'm not sure I understand what you mean. I assume the destination textures for MRT are used as multiple source textures for one shader draw call, then even if the textures are separate, this should be efficient. Is this correct?

I mentioned "made unmanaged" in order to cover for the performances part, assuming that: once it is made unmanaged by ebitengine, it will never be moved again to an atlas or merged with other atlases. However, if you meant that they can be moved for the sole purpose of ensuring that a draw call can be performed, but that they can be moved back to atlases, then it's not good (we would like this operation to happen once at most).

If an image is unmanaged (NewImageOptions.Unmanaged), right, the image never goes to an atlas. If an image is managed, the image might go to atlas again in some conditions (e.g. the image is used as a source for a while, and the image is not used as destination)

Zyko0 commented 6 months ago

I'm not sure I understand what you mean. I assume the destination textures for MRT are used as multiple source textures for one shader draw call, then even if the textures are separate, this should be efficient. Is this correct?

Yes! (faster than batched triangles multiplied by N regions on a single texture, since it would be a single region here and just N writes from the same shader call)

If an image is unmanaged (NewImageOptions.Unmanaged), right, the image never goes to an atlas. If an image is managed, the image might go to atlas again in some conditions (e.g. the image is used as a source for a while, and the image is not used as destination)

Okay yeah then it's acceptable I think, I understand what you mean. Setting it as unmanaged for more control over the performances should be a user tweak then!

hajimehoshi commented 3 months ago

I think we have already discussed in Discord, but what we have reached an agreement is that

Is that correct?

hajimehoshi commented 3 months ago

As we discussed in Discord:

tinne26 commented 3 months ago

By the way, slightly out of topic, but I have found a use-case for this feature in a 2D game, so I'll share it here:

Sounds a bit convoluted, but it's a nice and purely 2D use-case. There are decent alternative ways around it in this case, though.

Zyko0 commented 3 months ago

@tinne26 Very cool!! And as a bonus you also render it once, even though the gain might not be massive!

hajimehoshi commented 3 months ago

@tinne26 Hmm? I still don't understand how MRT resolves the gap issue image

tinne26 commented 3 months ago

Those disconnected graphics are on the back and front layer respectively, drawn on a logical canvas of 256, 144. The reason they appear disconnected is that I have a separate high resolution draw in the middle, so I need to project the logical canvas first before the high res draw and then do the same for the front layer after the high res draw. One idea to solve this is to use MRT to make the logical draws to 2 canvases, both of size 256x144. One will be used for the regular graphics, and the other will be used to keep track of the connectivity of the elements drawn at logical size. So, on the third draw pass, during the front layer logical draw, I have a clean canvas with the front layer and another that also includes the previous data (what I'm calling the "connectivity canvas"). I can use this connectivity canvas on the {logical => high res} projection to "correct" these gaps (theoretically). There are many different strategies though, both with MRT and without MRT, but MRT seems to make life easier in this case.

In any case, I'm not particularly arguing in favor of MRT or anything, you all know I'm more interested in depth buffers than MRT, but it's still an interesting example of how MRT might have some uses even in 2D. In fact, more uses would come for MRT if we actually had depth buffers too, as isometric games can absolutely use depth information for many things, and if you can draw that at the same time as the main tiles, that's great.

hajimehoshi commented 3 months ago

So if we should do:

you meant we can change them with MRT into like these:

?

If backend and frontend are different very much, would the MRT shader be efficient? Why not using one low-res canvas?

Maybe I don't understand this sentence:

One will be used for the regular graphics, and the other will be used to keep track of the connectivity of the elements drawn at logical size.

hajimehoshi commented 3 months ago

image

OK so I missed the middle layer, but I still don't understand what and how MRT resolves. Please list draw calls before and after MRT, thanks!

hajimehoshi commented 3 months ago

@Zyko0 By the way, how much would the performance be imporved by your experimental PR?

Zyko0 commented 3 months ago

I paused my side project actually to focus on this, also not knowing if this feature would get accepted or not originally. So I haven't tested yet, but I'm excited to, it's just mean quite a big refactor, so I didn't try yet, but I can try later if you want!

I also paused it because replicating draw calls and the same costly operations wasn't sustainable for new effects I wanted to add. So I decided to stop at an arbitrary number of features.

These features require tracing/image information (that could come for free with MRT), but just impossible to add to the current load, so I've not implemented those yet.

Zyko0 commented 3 months ago

By the way, how much would the performance be imporved by your experimental PR?

@hajimehoshi I made 2 frame captures using RenderDoc to see the differences between the current implementation and the mrt one:

The difference is that the EID=88 (0.6ms) call from first screenshot is not necessary anymore in the new version.

Current - no MRT (2.46ms?): qrenderdoc_KOevHcNHAk

With MRT (1.95ms?) (single tracing, multiple outputs + a deferred rendering merging pass): qrenderdoc_tMKuH7Vwdb

hajimehoshi commented 2 months ago

I'm happy that there seems improvement in MRT!

By the way, I was wondering if there are other potential users or other use cases than @Zyko0 . As this would be a pretty big change and a big burden of maintenance, I'd like to know those.