libretro / glsl-shaders

This repo is for glsl shaders converted by hand from libretro's common-shaders repo, since some don't play nicely with the cg2glsl script.
907 stars 222 forks source link

Unoptimized gamma correction shader math in crt-pi #35

Open battaglia01 opened 6 years ago

battaglia01 commented 6 years ago

There's a few unoptimized lines of code in the gamma correction part of crt-pi.glsl, which is linked for reference here: https://github.com/libretro/glsl-shaders/blob/master/crt/shaders/crt-pi.glsl

Gamma correction has been noted to be a potential source of slowdown in the code, and also in this thread here. However, all of the math here is really unoptimized, which is likely what is causing the slowdown.

Gamma correction is done on line 190-208. For reference here:

#if defined(SCANLINES)
#if defined(GAMMA)
#if defined(FAKE_GAMMA)
        colour = colour * colour;
#else
        colour = pow(colour, vec3(INPUT_GAMMA));
#endif
#endif
        scanLineWeight *= BLOOM_FACTOR;
        colour *= scanLineWeight;

#if defined(GAMMA)
#if defined(FAKE_GAMMA)
        colour = sqrt(colour);
#else
        colour = pow(colour, vec3(1.0/OUTPUT_GAMMA));
#endif
#endif
#endif

If we assume SCANLINES, GAMMA and FAKE_GAMMA are all defined, the above reduces to the following:

        colour = colour * colour;
        scanLineWeight *= BLOOM_FACTOR;
        colour *= scanLineWeight;
        colour = sqrt(colour);

Is there a reason it's being done like this? All of that is equivalent to

        colour *= sqrt(scanLineWeight * BLOOM_FACTOR)

This saves one multiplication and three assignments per loop! We avoid the unnecessary squaring and subsequent square rooting of colour, and we also don't need to update scanLineWeight as it's never used again in this scope. we' I don't know how much the assignments matter or if they're optimized out anyway, but fighting with the emulator over memory accesses has been noted as one of the major causes of slowdown, so worth bringing up...

There's a similar (but slightly trickier) thing you can do with the true gamma correction, not just FAKE_GAMMA, but I'll start here for now to see if I'm on the right wavelength...

hizzlekizzle commented 6 years ago

Yeah, probably just done that way for code clarity. It'd be worth looking at the assembly to see how much of a difference it makes.

battaglia01 commented 6 years ago

I'd be really surprised if any compiler knew to optimize a squaring and subsequent square root into one operation. The assignments, probably.

How can I compile this to assembly and check the output? Does OpenGL have an app for that, or do I just do something with GCC? Not used to GL shaders.

On Mon, Oct 9, 2017 at 12:03 PM hizzlekizzle notifications@github.com wrote:

Yeah, probably just done that way for code clarity. It'd be worth looking at the assembly to see how much of a difference it makes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libretro/glsl-shaders/issues/35#issuecomment-335202921, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-SsuzmUAvY7hzypNjR2-2k9woYy6Oqks5sqkO7gaJpZM4Pxk3D .

-- Mike

hizzlekizzle commented 6 years ago

That's a good question. I've used fxc.exe for HLSL shaders, but there doesn't seem to be anything as universally easy to use for GLSL, which probably shouldn't surprise me...

However, it seems this Radeon GPU Analyzer from AMD may be able to do it: https://github.com/GPUOpen-Tools/RGA/releases

metallic77 commented 1 year ago

It gains about 15-20 fps this way in my test. 668 after, 650 before, thats 2-3% difference.