Splinter cell double agent: Dynamic lights not being rendered on surfaces

Fabxx commented 2 years ago

Title

https://xemu.app/titles/5553005e/#Tom-Clancy-s-Splinter-Cell-Double-Agent

Bug Description

Happens in iceland level, first part, but also in other levels using the same lights.

https://user-images.githubusercontent.com/30447649/159269038-ddbc1c9e-501c-49a9-b767-57a6d753bcbf.mp4

Expected Behavior

https://www.youtube.com/watch?v=QDKhIzr6Slg

xemu Version

xemu_version: 0.6.2-88-g6e1969001e

System Information

Manjaro Linux i7-10700 4.80GHz GTX 970 STRIX 4GB VRAM Nvidia driver 510.54 - CUDA Version 11.6

Additional Context

No response

abaire commented 2 years ago

First light in the first level:

xemu: xemu_first_light

HW: xbox_first_light

pgraph: scda_first_light.txt

abaire commented 2 years ago

Note that lighting is turned off for the entire frame, so this is probably not a dynamic lighting issue and is more likely compositing, geometry, or texture related.

abaire commented 2 years ago

Looking at the hardware trace, the light appears to be applied fairly early in the rendering (maybe 1/3rd of the way through all the draw calls). It looks like it's done in a stage that takes four textures: a circular light map, a bump map, a sunset colored texture, and a white->black gradient.

On hardware the bump map seems to consistently be a 256x256 texture, on xemu for the frame I examined it was 8x8 and always solid purple. Not sure yet if this is relevant, but presumably the output will be at least somewhat incorrect based on this delta.

I believe the relevant pixel shader is:

pT0.xy = texScale0 * pT0.xy;
vec4 t0 = textureProj(texSamp0, pT0.xyw);
pT1.xy = texScale1 * pT1.xy;
vec4 t1 = textureProj(texSamp1, pT1.xyw);
vec4 t2 = texture(texSamp2, pT2.xyz / pT2.w);
pT3.xy = texScale3 * pT3.xy;
vec4 t3 = textureProj(texSamp3, pT3.xyw);
vec4 r0;
r0.a = t0.a;
// Stage 0
ab.rgb = clamp(vec3(dot((2.0 * max(t1.rgb, 0.0) - 1.0), (2.0 * max(t2.rgb, 0.0) - 1.0))), -1.0, 1.0);
r0.rgb = ab.rgb;
r0.a = ab.b;
// Stage 1
ab.rgb = clamp(vec3((max(r0.rgb, 0.0) * t0.rgb)), -1.0, 1.0);
r0.rgb = ab.rgb;
ab.a = clamp(((max(r0.a, 0.0) * t0.a)), -1.0, 1.0);
r0.a = ab.a;
// Stage 2
ab.rgb = clamp(vec3((r0.rgb * t3.rgb)), -1.0, 1.0);
r0.rgb = ab.rgb;
ab.a = clamp(((r0.a * t3.a)), -1.0, 1.0);
r0.a = ab.a;
// Stage 3
mux_sum.rgb = clamp(vec3((((r0.rgb * v0.rgb) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * c0_3.rgb)) * 2.0)), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
ab.a = clamp(((c0_3.a * (1.0 - clamp(vec4(0.0).a, 0.0, 1.0)))), -1.0, 1.0);
r0.a = ab.a;
// Final Combiner
fragColor.rgb = max(clamp(vec4(v1.rgb + r0.rgb, 0.0), 0.0, 1.0).rgb, 0.0) + mix(vec3(max(vec4(0.0).rgb, 0.0)), vec3(max(vec4(0.0).rgb, 0.0)), vec3(max(vec4(0.0).rgb, 0.0)));
fragColor.a = max(r0.a, 0.0);

The light map texture is NV097_SET_TEXTURE_FORMAT_COLOR_L_DXT1_A1R5G5B5 which is potentially interesting. When I look at the texture in renderdoc it's entirely opaque.

abaire commented 2 years ago

Very interestingly, in the pass that I think should be rendering the light, renderdoc reports that blending is enabled with a color src of dst alpha, a color destination of zero and color op of add.

0 True Dst Alpha Zero Add Dst Alpha Zero Add RGB_

this seems likely to be incorrect.

From the pgraph though it seems intentional:

nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_COLOR_MASK<0x358> (0x10101 {Red:W, Green:W, Blue:W, Alpha:RO})
nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_BLEND_FUNC_DFACTOR<0x348> (NV097_SET_BLEND_FUNC_DFACTOR_V_ZERO<0x0>)
nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_BLEND_FUNC_SFACTOR<0x344> (NV097_SET_BLEND_FUNC_SFACTOR_V_DST_ALPHA<0x304>)

The fact that the alpha channel is masked off makes this especially interesting.

UPDATE: I'm wrong, the hardware and xemu handle this the same way (the blend factor just uses whatever the alpha in the existing buffer is). It is very possible that the alpha in the buffer is wrong, however, since modifying the combiner to produce a solid bright color has no visible effect at all.

UPDATE: The alpha in the buffer is 0 on the pass that uses the correct mesh. I've looked at every draw pass afterwards that uses the lighting texture and none of them apply to that mesh again (unsurprisingly). I'm guessing the target buffer is in a different state (alpha-wise) as compared with hardware when the light is applied and the problem is upstream of that draw.

Comparing to the hardware trace, the buffer state before the light is applied is definitely different. On hardware the alpha is non-0 in the target area, whereas in renderdoc it is completely transparent. This, along with the blend settings, explains why the texture is not being applied. Interestingly, on hardware there are a large number of draw actions done with the alpha in this (correct) state. The light is applied in draw 193 in my test frame, and the interesting region of the backbuffer looks to be correct from draw 89 or so (assuming it's the same target buffer, at the time of the trace I had not yet updated nv2a-trace to track the surface address).

UPDATE: I updated nv2a-trace to keep track of the surface address and it looks like in the HW trace draw 58 starts compositing the alpha in the framebuffer. Will need to look at the pgraph log around this to figure out what's going on, the output almost looks like it was created with a stencil, very hard edges and alpha is either 100% (interestingly the surface where the light shines, which also happens to be far away from the camera) or 0% (everywhere else).

Screenshot_20220410_214758

abaire commented 2 years ago

I found something interesting in the pgraph/renderdoc comparison. Around the time that I expect the alpha values to be rendered, it appears that xemu is somehow failing to bind a greyscale texture that I'm guessing contains depth information or something around those lines.

In the pgraph log I see:

nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_TEXTURE_OFFSET[1]<0x1B40> (0x28E8000)
nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_TEXTURE_FORMAT[1]<0x1B44> (0x13029 {DMA_A, BORDER_SOURCE_COLOR, LU_IMAGE_DEPTH_Y16_FIXED, MipmapLevels:1, 2D, BaseSizeU:1, BaseSizeV:1, BaseSizeP:1})
nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_TEXTURE_CONTROL1[1]<0x1B50> (0x4000000 {Pitch: 1024})
nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_TEXTURE_IMAGE_RECT[1]<0x1B5C> (0x2000200 W:512 H:512)
nv2a_pgraph_method 0: NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_TEXTURE_CONTROL0[1]<0x1B4C> (0x4003FFC0 {MaxAniso:1, MaxLOD:4095, MinLOD:0})

which indicates that there should be a Y16_FIXED linear texture of size 512x512.

If I look at the same draw in renderdoc, there's a texture sampler for texture 1 that is unbound: Screenshot_20220411_092515

Forcibly setting an arbitrary value for the texture causes the alpha channel to be updated in a way consistent with what I see in hardware, so I'm relatively confident the bug lies here.

UPDATE: Forcibly setting the missing texture value also causes the light texture to be rendered (in renderdoc) later on in the frame, so I'm highly confident this is the cause.

Also, NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_DEPTH_Y16_FIXED happens to be one of the formats that I do not currently test in the pgraph tester so it may simply be an issue with how that format is handled by xemu generally.

UPDATE: The hardware behavior when attempting to render DEPTH_Y16_FIXED and Y16 differs significantly from xemu, so a couple fixes will be needed.

It looks like for D_Y16_FIXED the value ends up always being 0 when rendered (on HW), in xemu there's a red ramp. This may be explainable by the fact that DEPTH_Y16FIXED seems to be used as a 3D texture (at least in SC:DA) and the test uses it as a 2D. Presumably the game is using it as a shadow map. Could be an indication that a [shadow sampler](https://www.khronos.org/opengl/wiki/Sampler(GLSL)#Shadow_samplers) needs to be used in OpenGL as well. Looking at the renderdoc state the t1 coordinates include a non-zero z-factor.
For Y16 xemu does a greyscale ramp, hardware surprisingly does a red ramp

UPDATE: There's something more to the SC:DA failure as the depth format texture I just added to the pgraph test "works" (it produces incorrect color output, but the input texture shows up as a greyscale ramp as expected).

UPDATE: Looks like the texture is set as a zeta target in the draw just prior to where things go wrong. In the failed draw I see a GPU->RAM copy for the surface:

nv2a: Target: [COLOR @ 2b14000] (ln) aa:0 clip:x=0,w=640,y=0,h=480
nv2a:  Match: [COLOR @ 2b14000 (640x480)] (ln) aa:0, clip:x=0,w=640,y=0,h=480
nv2a:    Hit: [COLOR @ 2b14000 (640x480)] (ln) aa:0, clip:x=0,w=640,y=0,h=480
nv2a: Target: [ ZETA @ 29e8000] (ln) aa:0 clip:x=0,w=640,y=0,h=480
nv2a:  Match: [ ZETA @ 29e8000 (640x480)] (ln) aa:0, clip:x=0,w=640,y=0,h=480
nv2a:    Hit: [ ZETA @ 29e8000 (640x480)] (ln) aa:0, clip:x=0,w=640,y=0,h=480
nv2a: [GPU->RAM] ZETA (lin) surface @ 28e8000 (w=512,h=512,p=1024,bpp=2)

Stepping through texture binding I see it ends up hitting the "Saved an upload! Reuse existing texture in graphics memory." optimization.

abaire commented 2 years ago

I'm increasingly of the belief that the problem is the mishandling of the D_Y16_FIXED texture as a texture instead of a shadow map. I see a suspicious looking unhandled 0x1E6C method with parameter 0x6 in the same block that uses the D_Y16_FIXED texture. The same method is then set to 0 after the draw calls that make use of the depth texture. This command is also next to the depth function and mask setters, the shader stage program is 3D, and the geometry produces a third component that is likely the reference value. The observed output also aligns with this theory, it is either 0 or 1, just like a shadow map comparison would be and the only unhandled param in the group is 0x1E6C which makes it pretty likely to be the selection of the comparator.

I'm going to try to scrap together a test case guessing that 6 will be something equivalent to one of the GL_TEXTURE_COMPARE_FUNC's then see if I can get it to render something.

UPDATE: Confirmed that 0x1E6C is related to shadow mapping. Using the info from #712 (specifically NV20_3D_TEX_RCOMP_NEVER and NV20_3D_TEX_RCOMP_ALWAYS) I can vary the output in a trivial test. I'll write up a more complicated version to validate the other modes, then implement in xemu.

abaire commented 2 years ago

Test Results from HW

xemu-project / xemu