List of bugs and potential performance improvements

george-monumental commented 2 years ago

Hi Cody,

first of all, I'd like to thank you for giving away such an awesome piece of software for free. We are using it in the latest version of Crowfall. It works much much better than I anticipated.

However, while I was integrating it, I found few things that could be improved, especially performance-wise:

You don't need flip-flopping volume textures because you are writing to integerVolume but you are reading from volumeTexture[]. So you can get rid of the volumeTextureB render texture as well as the voxelFlipFlop and activeVolume variables.

Also you can skip creating secondaryIrradianceVolume when infiniteBounces is disabled. These two changes alone will halve the VRAM requirement from about 512 MB to about 256 MB.

The previousActiveVolume at SEGI.cs:154 is passed to the transferIntsCompute shader, but the shader doesn't use it. So you can rid of it.

This line in SEGI.cs doesn't do anything since DepthTextureMode.Node == 0. You probably meant to write "=" instead of "|=":

shadowCam.depthTextureMode |= DepthTextureMode.None;

This line in SEGI.cs will throw a NullRef if sun hasn't been assigned. You are checking for null in other places but not here:

Vector3 shadowCamPosition = voxelSpaceOrigin + Vector3.Normalize(-sun.transform.forward) * shadowSpaceSize * 0.5f * shadowSpaceDepthRatio;

Your VRAM calculation for the sun depth texture isn't correct, since you are creating both a color and depth buffer:

v += sunDepthTextures[i].width * sunDepthTextures[i].height * 16;       // should have been 16 bits for the rhalf color and 16 bits for the depth

You could create a depth-buffer-only texture like this to save some VRAM

new RenderTexture(sunShadowResolution, sunShadowResolution, 16, RenderTextureFormat.Depth, RenderTextureReadWrite.Linear);

(use RenderTextureFormat.Depth instead of RHalf) but it would require some additional changes (see below).

CleanupTexture is missing a texture.Release(); A null check would make things easier as well. You are also still destroying textures manually in many places instead of calling this function.

This line at SEGI.cs:827 is not being used. I think, it also should have been voxelSpaceSize / voxelResolution:

float voxelTexel = (1.0f * voxelSpaceSize) / (int)voxelResolution * 0.5f;           //Calculate the size of a voxel texel in world-space units

Not a bug but this line (SEGI.cs:380)

int resolution = (int)voxelResolution / Mathf.RoundToInt(Mathf.Pow((float)2, (float)i));

can be written much simpler as

int resolution = (int)voxelResolution >> i;

Similarily, the line 975 could be simplified to

int destinationRes = (int)voxelResolution >> (i + 1);

notReadyToRender is never set to true anywhere.

I didn't understand initChecker = new object() (SEGI.cs:620). Couldn't you have simply used a bool and avoided the memory allocation?

Cache all shader parameter ids with Shader.PropertyToID to avoid a string-to-id conversion with every call:

// Shader.SetGlobalTexture("SEGIVolumeTexture1", secondaryIrradianceVolume);
static readonly int _SEGIVolumeTexture1Id = Shader.PropertyToID("SEGIVolumeTexture1");
Shader.SetGlobalTexture(_SEGIVolumeTexture1Id, secondaryIrradianceVolume);

Also avoid run-time memory allocations due to string concatenation like here

// Shader.SetGlobalTexture("SEGIVolumeLevel" + i.ToString(), clipmaps[i].volumeTexture0);
Shader.SetGlobalTexture(IDs.SEGIVolumeLevel[i + 1], volumeTextures[i + 1]);

When you are creating the voxelCamera in line 556, you should also set aspectRatio to 1. The reason is that you are only specifying the vertical orthographic size. The horizontal orthographic size is calculated from the aspect ratio which is the screen aspect ratio or the target texture aspect ratio. You are not assigning a target texture until line 949, so the projection matrix in line 883 is actually wrong for the first frame. Alternatively, you could also assign the target texture when creating the camera.

At SEGI.cs line 293

Shader.SetGlobalColor(IDs.GISunColor, new Color(Mathf.Pow(sunColor.r, 2.2f), Mathf.Pow(sunColor.g, 2.2f), Mathf.Pow(sunColor.b, 2.2f), Mathf.Pow(sunIntensity, 2.2f)));
Shader.SetGlobalColor(IDs.SEGISkyColor, new Color(Mathf.Pow(skyColor.r * skyIntensity * 0.5f, 2.2f), Mathf.Pow(skyColor.g * skyIntensity * 0.5f, 2.2f), Mathf.Pow(skyColor.b * skyIntensity * 0.5f, 2.2f), Mathf.Pow(skyColor.a, 2.2f)));

As you know, you do have to linearize colors that you pass to shaders when you are using linear workflow (see PlayerSettings.Player.ColorSpace). MaterialPropertyBlock.SetColor and Material.SetColor do this automatically for you but Shader.SetGlobalColor doesn't. So it is correct that you are doing it manually here. Usually, when linearizing, you don't want to modify the alpha value. This is exactly what Color.linear is doing. I don't think sunIntensity should be linearized, so you can simplify this code by calling Color.linear. A non-linear sun intensity makes GI become very faint when sun intensity is less than 1 because a value less than 1 to the power of 2.2 becomes a very small value.

Shader.SetGlobalColor(IDs.GISunColor, new Color(sunColor.r, sunColor.g, sunColor.b, sunIntensity).linear);
Shader.SetGlobalColor(IDs.SEGISkyColor, new Color(skyColor.r * skyIntensity, skyColor.g * skyIntensity, skyColor.b * skyIntensity, skyColor.a).linear);

voxelToGIProjection at SEGI.cs line 888 is calculated before the shadowCam position and orientation is set:

Matrix4x4 voxelToGIProjection = (shadowCam.projectionMatrix) * (shadowCam.worldToCameraMatrix) * (voxelCamera.cameraToWorldMatrix);
Shader.SetGlobalMatrix("SEGIVoxelToGIProjection", voxelToGIProjection);

...

//Render the depth texture from the sun's perspective in order to inject sunlight with shadows during voxelization
if (sun != null)
{
    shadowCam.cullingMask = giCullingMask;

    Vector3 shadowCamPosition = voxelSpaceOrigin + Vector3.Normalize(-sun.transform.forward) * shadowSpaceSize * 0.5f * shadowSpaceDepthRatio;

    shadowCamTransform.position = shadowCamPosition;
    shadowCamTransform.LookAt(voxelSpaceOrigin, Vector3.up);

For line 287 in SEGIVoxelizeScene

float4 shadowPos = mul(SEGIVoxelProjectionInverse, float4(fcoord * 2.0 - 1.0, 0.0));

w should be 1, I think. You are transforming a point, not a vector. That's probably also the reason why you need such a high bias here

float sunVisibility = saturate((sunDepth - shadowPos.z + 0.2525) * 1000.0);  // you can use 0.001 instead of 0.2525 if you change w to 1

Using 1 for w also allows you to combine the two matrices and save an expensive matrix multiplication in the pixel shader.

At SEGI.shader line 114, you are calculating sin(asin(latitude)). You can also use trigonometric identities and the sincos function to reduce the number of expensive transcendentals. Instead of

float latitude = asin(fiN * 2.0 - 1.0);

float3 kernel;
kernel.x = cos(latitude) * cos(longitude);
kernel.z = cos(latitude) * sin(longitude);
kernel.y = sinLatitude);

you can write

float longitude = gAngle * fi;

float sinLatitude = fiN * 2.0 - 1.0;
float cosLatitude = sqrt(1.0 - sinLatitude * sinLatitude);
float sinLongitude, cosLongitude;
sincos(longitude, sinLongitude, cosLongitude);

float3 kernel;
kernel.x = cosLatitude * cosLongitude;
kernel.z = cosLatitude * sinLongitude;
kernel.y = sinLatitude;

I hope the compiler optimizes these but I wouldn't rely on it: SEGI.shader line 81: float3 viewVector = normalize(viewSpacePosition.xyz); // unused SEGI.shader line 121: kernel = normalize(kernel + worldNormal.xyz * 1.0); // times 1 SEGI.shader line 213: int HalfResolution; // unused SEGI.shader line 223: float3 albedo = albedoTex.rgb; // unused SEGI.shader line 465: float3 albedo = albedoTex.rgb; // unused SEGI.cginc line 27-31:

sampler3D SEGIVolumeLevel6;  // unused
sampler3D SEGIVolumeLevel7;  // unused
sampler3D VolumeTexture1;  // unused
sampler3D VolumeTexture2;  // unused
sampler3D VolumeTexture3;  // unused

SEGI.cginc line 117: coneDistance -= 0.00; SEGI.cginc line 143: float falloffFix = pow(fi, 1.0) * 4.0 + NearLightGain; // pow(fi, 1) = fi SEGITraceScene.shader line 34-35:

sampler3D SEGIVolumeLevel6; // unused
sampler3D SEGIVolumeLevel7; // unused

SEGITraceScene.shader line 156: float4x4 SEGIVoxelToGIProjection; // unused SEGITransferInts.compute line 5: Texture3D<float4> PrevResult; // unused SEGITransferInts.compute line 14: float4 VoxelOriginDelta; // unused SEGIVoxelizeScene.shader line 8: _Cutoff ("Alpha Cutoff", Range(0,1)) = 0.333 // unused SEGIVoxelizeScene.shader line 28: int LayerToVisualize; // unused SEGIVoxelizeScene.shader line 70: float3 absNormal = abs(o.normal); // unused SEGIVoxelizeScene.shader line 320: const float sqrt2 = sqrt(2.0) * 1.0; // times 1 is superflous. Also, I'd just hardcode the constant. Not sure if the compiler is smart enough to precalculate the value.

In SEGI.cginc VisualConeTrace:

float4 sample = tex3Dlod(SEGIVolumeLevel0, float4(voxelCheckCoord.xyz, coneSize));

SEGIVolumeLevel0 doesn't have mips, so it's pointless to calculate coneSize. Also, I would simply stop at the first opaque voxel instead of using voxel cone tracing. It also helps to add a small constant brightness so that voxels that don't get any light are not completely black. I'm aware that this is just used for debugging, though.

I've tried converting the if-cascade at line 125 to a switch in the hope that it can be optimized with a jump table but it turned out that UNITY_BRANCH switch(mipLevel) becomes an if-cascade anyways.

In SEGIMipFilter.compute:

for (int i = 0; i < destinationRes; i++)
{
    float3 fcoord = float3((float)id.x / destinationRes, (float)id.y / destinationRes, (float)i / destinationRes);
    float texel = 1.0 / destinationRes;
    ...
}

Calculate the inverse of the destinationRes outside of the loop (or even better, pass a uniform with the inverse). Just move the texel calculation outside of the loop.

In CSMainGaussian:

int c = 0;

for (int j = 0; j < 8; j++)
{
    float3 offset = float3(0, 0, 0);

    offset = offsets[j];

    tex += Source.SampleLevel(_LinearClamp, fcoord + texel * 0.5 + offset * texel * 0.5, 0.0);
    c++;
}

tex /= c;

c will always be 8, so simply multiply tex by 0.125.

At SEGITraceScene.shader line 186:

for (int i = 0; i < numSteps; i++)
{
    float fi = ((float)i) / numSteps;

Calculate the inverse outside of the loop and multiply with the reciprocal (multiplications are way cheaper than division).

At SEGITraceScene.shader line 195:

if (mipLevel == 0)
    sample = tex3Dlod(SEGIVolumeLevel1, float4(voxelCheckCoord.xyz, coneSize));

Miplevel 0 is sampling from SEGIVolumeLevel1 instead of SEGIVolumeLevel0. This could be intentional, but I thought, I'll point it out in case it isn't.

In SEGIVoxelizeScene.shader line 302, skip the texture lookup when emission color is black:

float4 emissionTex = tex2D(_EmissionMap, input.uv.xy);

Texture lookups are the most expensive things you can do in a shader so you can use dynamic branching to skip it, since very few voxels are actually emissive:

float emissionTex = 0.0;
UNITY_BRANCH
if (dot(_EmissionColor.rgb, _EmissionColor.rgb) == 0.0)
    emissionTex = tex2D(_EmissionMap, input.uv.xy);

Same for sampling the volume texture when infiniteBounces are off (line 314):

float4 prevBounce = tex3D(SEGIVolumeTexture1, fcoord + SEGIVoxelSpaceOriginDelta.xyz);
col.rgb += prevBounce.rgb * 1.6 * SEGISecondaryBounceGain * tex.rgb * color.rgb;

becomes

UNITY_BRANCH
if (SEGISecondaryBounceGain > 0)
{
    float4 prevBounce = tex3D(SEGIVolumeTexture1, fcoord + SEGIVoxelSpaceOriginDelta.xyz);
    col.rgb += prevBounce.rgb * 1.6 * SEGISecondaryBounceGain * tex.rgb * color.rgb;
}

You could also premultiply GISunColor.rgb by GISunColor.a and _EmissionColor.rgb * 0.9 (line 312) on the CPU, but that will probably not make a noticeable difference.

SEGIVoxelizeScene.shader line 260:

#define VoxelResolution (SEGIVoxelResolution * (1 + SEGIVoxelAA))

Not sure why this is a define. Just make it a variable so that it's only calculated once.

SEGIVoxelizeScene.shader line 89:

void geom(triangle v2g input[3], inout TriangleStream<g2f> triStream)
{
    v2g p[3];
    for (int i = 0; i < 3; i++)
    {
        p[i] = input[i];
        p[i].pos = mul(unity_ObjectToWorld, p[i].pos);                      
    }

You can move the matrix multiplication to the vertex shader. No need to do it more than once per vertex.

There are a few problems with shadow map rendering, but it's not easy to fix, unfortunately:

shadowCam.RenderWithShader(sunDepthShader, "");

First of all, this manually renders depth to a color target AND a depth buffer. You only need the depth buffer, but it's not easy to do that. I think, you could use a temporary render target for the depth at least - instead of creating sunDepthTexture with both a color and depth. Another problem is, shader overrides like this don't work with shaders that modify the vertex in the vertex shader or have a geometry shader. The correct way would be to render the shadow map with the shadow pass of the original shaders - but you have to make individual draw calls to do that. The advantage of that is that you can also get rid of the sunDepthShader. This also doesn't take the ShadowCastingMode of the mesh renderers into account.

However, as you are aware - I'm sure - the biggest performance problem is rendering the shadow cam and the voxel cam every frame. This will triple the draw calls. You can update them less often but it's not a good solution:

You'd still get hitches every couple of frames
Since you are no longer rendering every frame, you can't render just what's in front of the camera. So you have to center the voxel space around the camera which reduces the voxel space distance.

We used time-sliced rendering for both the voxel space camera and the shadow camera. All the renderers register themselves into a grid and only one grid cell is being rendered at a time. We also separated shadow rendering and voxel rendering by double-buffering the shadow map. This means, it will take about a second until all the grid cells around the camera have been updated but it's much more performant. This might not work for every game because you have to exclude dynamic objects from contributing GI that are moving fast (like characters) but you can still have dynamic time of day and slow moving objects (dynamic destruction, doors, falling trees, etc.).

We also used half the vertical voxel space resolution so that we could increase the horizontal resolution. But this only makes sense for exteriors where you have a long view distance.

Finally, we also added support for point lights - which was not very hard since we already had a grid that could give us a list of lights for a given cell.

It really surprised me that voxel based GI can work that well in an open-world game. The voxels are huge but it doesn't matter really. You can get a few light leaks if you make them too big but it's barely worth mentioning. Also the reflections look amazing. Temporal stability is great as well.

Anyways, keep up the good work and thank you again for making SEGI available to everyone!

Georg Principal Engineer at Monumental

Nolram12345 commented 2 years ago

Wow, this has to be the longest GitHub improvements issue I have ever seen. A lot of the changes you talked about in the last bit of it seem quite interesting, especially the support for point lights and such - is there any chance to seeing those go public as well in the future ?

george-monumental commented 2 years ago

Unfortunately, we can't publish the source code of the additions that we've made for legal reasons but I at least wanted to give some feedback as a token of appreciation for making SEGI available under a MIT license.

Adding support for point lights is not that hard. Just pass a list of point lights to the voxelization shader and iterate over them to add the light contribution just like for the directional light. If you have a lot of point lights, you'll want to add some code on the CPU to find out which lights affect a draw call.

ninlilizi commented 2 years ago

I should probably roll the changes into a fork myself.

My old rubbish fork is just a mess as is, being my 'learn to shader' experiment. Wondering if I should remake that now I know what I am doing and can easily ace the job.

Nolram12345 commented 2 years ago

Well, it would certainly be great to have a more... mature(?) version of SEGI that would actually be usable in some regard beyond the demos provided, as there is a current lack of such a global illumination method for Unity that is freely available. Support for point lights and time slicing would certainly be very appreciated features !

sonicether / SEGI

List of bugs and potential performance improvements #29