Open minetoblend opened 1 month ago
OP is missing discussion of performance of this, which I'd expect to see given reliance on buffered containers which we have found to be rather expensive (especially in terms of vram usage).
OP is missing discussion of performance of this, which I'd expect to see given reliance on buffered containers which we have found to be rather expensive (especially in terms of vram usage).
Sure thing, I'll need a bit to do some proper benchmarks but for now I can go a bit into the performance considerations/testing I did so far. Is there anything specific you'd like to see in the benchmarks?
In the bit of testing I did so far with lazer (frame limiter on "basically unlimited", nvidia 980ti, 2560x1440) I saw a drop from around 950 to 700-800fps on more slider-heavy sections at 50% blur resolution and down to 700-600 at full resolution. Lowering the resolution below that had deminishing returns, so I'm assuming the framebuffer overhead and/or blur shader is a key contributor here (though I can't say anything for sure before I get around to actually measuring things).
The whole thing is using a single BufferedContainer
at the root, acting as the backbuffer. Whenever a BackdropBlurContainer
renders, it works similar to a BufferedContainer
but uses part of the backbuffer as the input for the blur shader instead of the MainBuffer
.
I added support for blurring at a different resolution than MainBuffer
(that one would usually stay at 100%), and when testing found that even at 25% resolution (so 6.25% texture memory usage compared to full size if we ignore overhead + other gpu memory weirdness) the results still look acceptable.
The primary memory overhead memory wise is the MainBuffer, which in the case of sliders would add a +100% on top of the framebuffer used for drawing the slider path.
The most effective way to reduce vram overhead would probably be to use some form of texture/framebuffer pooling, requesting a temporary framebuffer with a minimum required size from the pool on each draw & returning it right after. I contributed the same effect to pixijs a couple months ago, which makes extensive use of texture pooling for filters/effects. (Example)
Some other ideas for performance improvements are lowering the kernel size in the blur shader, unrolling the blur loop and/or using precomputed values for the gaussian distribution. I don't really know if those would improve performance in a meaningful way, they're just things I know pixijs does to speed up their blur shader, which has a shader pool that generates shader code on the fly for any requested kernel size. Can't really say anything conclusive about it though without testing & measuring first.
I managed to get rid of most of the framebuffers. I made a BackdropBlurPath
drawable which uses only one extra framebuffer per path/slider (before it was 3 extra) . If the buffer has a resolution of 25% that brings a reduction from previously x2.125 vram usage per slider to x1.0625, so it should barely make a dent now in terms of vram.
I also tried using 2 buffers so I can do both blur passes at a lower resolution and I think that was slightly faster. My guess is that this is since the blur shader seems to have quite a bit of overhead on its own. But I'll need to do more measuring to say for sure.
The drawing logic is now:
The blending formula in the final pass is still a bit incorrect which shows as a slightly darkish tint when alpha gets low, as well as slight artifacts on the edges with aa. There is some kinda premultiplied alpha weirdness going on that I cannot fully wrap my head around.
Also a touch concerned about performance of this given that we've seen the blur algorithm we're using perform pretty badly on some hardware already..
If the blur shader in particular is of concern, I think it's probably best if I use 2 lowres buffers and only do the blending in the final pass so the second blur pass can happen at a lower resolution too. Assuming 25% resolution that would increase the amount of vram used per slider to 1.125x compared to master.
With the second blur pass being at a lower resolution too now, assuming 25% resolution and 16px blur sigma it should be doing 5 texture samples per pixel for each blur pass.
I also noticed that performance was degraded when BlurSigma was zero, so I made some changes to only activate the backbuffer when a container is actually doing some blurring. With the new changes I'm seeing pretty much identical performance with BlurSigma=0 compared to master now.
Companion to ppy/osu#30347
Adds a
BackdropBlurContainer
that blurs the background behind it's children. It requires a parent of typeIBackbufferProvider
(aBufferedContainer
or aRefCountedBackbufferProvider
), who's framebuffer it will blur & then draw it back onto the framebuffer with a masking shader based on theBackdropBlurContainer
's children.Also adds a
RefCountedBackbufferProvider
which can be used to i.e. wrap the entire game in an on-demand buffercontainer, which automatically gets enabled when any of it's children need a backbuffer.There's still a bit of weirdness with texture clamping going on when
EffectBufferScale
goes above 1, but I couldn't quite figure that out on my own.https://github.com/user-attachments/assets/06b5eb69-de6a-4d75-9983-39fd0167700a