xemu-project / xemu

Original Xbox Emulator for Windows, macOS, and Linux (Active Development)
https://xemu.app
Other
2.84k stars 282 forks source link

Some Halo:CE fog effects broken #51

Closed mborgerson closed 2 years ago

mborgerson commented 4 years ago

Fog on certain levels (Halo, Silent Cartographer) does not appear.

Xbox:

xemu:

Screenshots courtesy ngrst183.

Title: 4d530004

Foxlum commented 3 years ago

@mborgerson Looks like a improper blending mode, and possible depth buffer issue.

nrgst183 commented 2 years ago

Found a new wrinkle with this bug (level is 343 Guilty Spark):

https://user-images.githubusercontent.com/74626919/149675543-4f6dc6c4-bccc-4607-af2b-94d8b32abc54.mp4

The fog appears to largely work as expected until this point in the level (although it almost looks too thick in some places).

NV2A Log capture for snapshot above: 343 good fog.log

NV2A Log capture for snapshot above: 343 bad fog.log

abaire commented 2 years ago

One thing I immediately notice is that the fog coordinate is being passed in as part of the geometry (in the other games where I've looked at fog it's been calculated in shaders).

E.g. (in a capture I made of the Silent Cartographer): nv2a: pgraph method (0): 0x97 -> 0x1734 NV097_SET_VERTEX_DATA_ARRAY_OFFSET+0x14 (0x87e73c) and

nv2a: pgraph method (0): 0x97 -> 0x1774 NV097_SET_VERTEX_DATA_ARRAY_FORMAT+0x14 (0x2024)
nv2a: NV097_SET_VERTEX_DATA_ARRAY_FORMAT[5] 0x2024
nv2a: vertex data array format=4, count=2, stride=32

At a glance, the use in the shaders seems interesting; I see v5 (the vertex fog coordinate) being used in shaders where oFog is not being set at all and I see other shaders where oFog seems to be set without regard to v5.

abaire commented 2 years ago

I also see that the fog param is being set to float2 vectors, so I'd assume Halo is not using the semantics for the input values and has its own.

Furthermore, totally disabling fog has no effect as it is not referenced by the color combiners at all:

// Stage 0
r0.rgb = clamp(vec3(dot(max(t2.rgb, 0.0), max(c1_0.rgb, 0.0))), -1.0, 1.0);
r1.rgb = clamp(vec3(dot(max(t2.rgb, 0.0), max(c0_0.rgb, 0.0))), -1.0, 1.0);
r0.a = clamp(((max(t2.b, 0.0) * (1.0 - clamp(vec4(0.0).b, 0.0, 1.0)))), -1.0, 1.0);
// Stage 1
v0.rgb = clamp(vec3(((max(v0.rgb, 0.0) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + (max(r0.rgb, 0.0) * max(c0_1.rgb, 0.0)))), -1.0, 1.0);
t1.a = clamp(((max(r0.a, 0.0) * (1.0 - clamp(vec4(0.0).b, 0.0, 1.0)))), -1.0, 1.0);
// Stage 2
r0.rgb = clamp(vec3((((1.0 - clamp(r0.aaa, 0.0, 1.0)) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + (max(r0.aaa, 0.0) * max(c0_2.rgb, 0.0)))), -1.0, 1.0);
r0.a = clamp(((max(r0.b, 0.0) * (1.0 - clamp(vec4(0.0).b, 0.0, 1.0)))), -1.0, 1.0);
r1.a = clamp(((max(r1.b, 0.0) * max(v1.a, 0.0))), -1.0, 1.0);
// Stage 3
t1.rgb = clamp(vec3(((max(vec4(0.0).rgb, 0.0) * (-max(vec4(0.0).rgb, 0.0) + 0.5)) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * max(t1.rgb, 0.0)))), -1.0, 1.0);
// Stage 4
t3.rgb = clamp(vec3((max(t3.rgb, 0.0) * max(v1.rgb, 0.0))), -1.0, 1.0);
v0.rgb = clamp(vec3((max(v0.rgb, 0.0) * max(r0.rgb, 0.0))), -1.0, 1.0);
// Stage 5
r1.rgb = clamp(vec3(((max(c1_5.rgb, 0.0) * (1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0))) + (max(v0.aaa, 0.0) * -c0_5.rgb))), -1.0, 1.0);
// Stage 6
t0.rgb = clamp(vec3(((max(t0.rgb, 0.0) * max(t1.rgb, 0.0)) + (max(t0.rgb, 0.0) * max(t1.rgb, 0.0)))), -1.0, 1.0);
// Stage 7
r0.rgb = clamp(vec3(((max(t0.rgb, 0.0) * max(v0.rgb, 0.0)) + (max(t3.rgb, 0.0) * max(r1.aaa, 0.0)))), -1.0, 1.0);
// Final Combiner
fragColor.rgb = max(r1.rgb, 0.0) + mix(vec3(max(c0_8.rgb, 0.0)), vec3(max(vec4(max(r0.rgb, 0.0) * max(c0_8.aaa, 0.0), 0.0).rgb, 0.0)), vec3((1.0 - clamp(v0.aaa, 0.0, 1.0))));
fragColor.a = max(t0.a, 0.0);
if (!(fragColor.a > alphaRef)) discard;
}
abaire commented 2 years ago

c0_8 appears to control the fog color, v0 (diffuse out from the vsh) controls the fog blend factor, a higher v0.a means more contribution from c0_8.

abaire commented 2 years ago

It looks like Halo does a special pass mixing in gradient textures to simulate fog and this is where things are diverging from HW. I suspect the trees might actually be correct (or closer to correct than the background).

Relevant draw for the geometry at the very start of The Silent Cartographer level (before moving in any way) is a glDrawElements with 2043 items.

Shader:

#version 330

struct VertexData {
  float inv_w;
  vec4 D0;
  vec4 D1;
  vec4 B0;
  vec4 B1;
  float Fog;
  vec4 T0;
  vec4 T1;
  vec4 T2;
  vec4 T3;
};
noperspective in VertexData g_vtx;
#define vtx g_vtx

out vec4 fragColor;

uniform vec4 fogColor;
float sign1(float x) {
    x *= 255.0;
    return (x-128.0)/127.0;
}
float sign2(float x) {
    x *= 255.0;
    if (x >= 128.0) return (x-255.5)/127.5;
               else return (x+0.5)/127.5;
}
float sign3(float x) {
    x *= 255.0;
    if (x >= 128.0) return (x-256.0)/127.0;
               else return (x)/127.0;
}
float sign3_to_0_to_1(float x) {
    if (x >= 0) return x/2;
           else return 1+x/2;
}
vec3 dotmap_zero_to_one(vec3 col) {
    return col;
}
vec3 dotmap_minus1_to_1_d3d(vec3 col) {
    return vec3(sign1(col.r),sign1(col.g),sign1(col.b));
}
vec3 dotmap_minus1_to_1_gl(vec3 col) {
    return vec3(sign2(col.r),sign2(col.g),sign2(col.b));
}
vec3 dotmap_minus1_to_1(vec3 col) {
    return vec3(sign3(col.r),sign3(col.g),sign3(col.b));
}
vec3 dotmap_hilo_1(vec3 col) {
    return col;
}
vec3 dotmap_hilo_hemisphere_d3d(vec3 col) {
    return col;
}
vec3 dotmap_hilo_hemisphere_gl(vec3 col) {
    return col;
}
vec3 dotmap_hilo_hemisphere(vec3 col) {
    return col;
}
const float[9] gaussian3x3 = float[9](
    1.0/16.0, 2.0/16.0, 1.0/16.0,
    2.0/16.0, 4.0/16.0, 2.0/16.0,
    1.0/16.0, 2.0/16.0, 1.0/16.0);
const vec2[9] convolution3x3 = vec2[9](
    vec2(-1.0,-1.0),vec2(0.0,-1.0),vec2(1.0,-1.0),
    vec2(-1.0, 0.0),vec2(0.0, 0.0),vec2(1.0, 0.0),
    vec2(-1.0, 1.0),vec2(0.0, 1.0),vec2(1.0, 1.0));
vec4 gaussianFilter2DRectProj(sampler2DRect sampler, vec3 texCoord) {
    vec4 sum = vec4(0.0);
    for (int i = 0; i < 9; i++) {
        sum += gaussian3x3[i]*textureProj(sampler,
                   texCoord + vec3(convolution3x3[i], 0.0));
    }
    return sum;
}
uniform ivec4 clipRegion[8];
uniform float texScale0;
uniform sampler2D texSamp0;
uniform float texScale1;
uniform sampler2D texSamp1;
uniform float texScale2;
uniform float texScale3;
uniform float alphaRef;
uniform vec4 c0_0;
uniform vec4 c1_0;
uniform vec4 c0_1;
uniform vec4 c1_1;
void main() {
/*  Window-clip (Inclusive) */
bool clipContained = false;
for (int i = 0; i < 8; i++) {
  bvec4 clipTest = bvec4(lessThan(gl_FragCoord.xy-0.5, clipRegion[i].xy),
                         greaterThan(gl_FragCoord.xy-0.5, clipRegion[i].zw));
  if (!any(clipTest)) {
    clipContained = true;
    break;
  }
}
if (!clipContained) {
  discard;
}
vec4 pD0 = vtx.D0 / vtx.inv_w;
vec4 pD1 = vtx.D1 / vtx.inv_w;
vec4 pB0 = vtx.B0 / vtx.inv_w;
vec4 pB1 = vtx.B1 / vtx.inv_w;
vec4 pFog = vec4(fogColor.rgb, clamp(vtx.Fog / vtx.inv_w, 0.0, 1.0));
vec4 pT0 = vtx.T0 / vtx.inv_w;
vec4 pT1 = vtx.T1 / vtx.inv_w;
vec4 pT2 = vtx.T2 / vtx.inv_w;
vec4 pT3 = vtx.T3 / vtx.inv_w;

vec4 v0 = pD0;
vec4 v1 = pD1;
pT0.xyw = texScale0 * pT0.xyw;
vec4 t0 = textureProj(texSamp0, pT0.xyw);
pT1.xyw = texScale1 * pT1.xyw;
vec4 t1 = textureProj(texSamp1, pT1.xyw);
vec4 t2 = vec4(0.0); /* PS_TEXTUREMODES_NONE */
vec4 t3 = vec4(0.0); /* PS_TEXTUREMODES_NONE */
vec4 r0;
r0.a = t0.a;
vec4 r1;
// Stage 0
v0.rgb = clamp(vec3((max(c0_0.aaa, 0.0) * max(t0.aaa, 0.0))), -1.0, 1.0);
t0.rgb = clamp(vec3((max(c0_0.rgb, 0.0) * max(t0.aaa, 0.0))), -1.0, 1.0);
r0.a = clamp((((max(c1_0.b, 0.0) * max(t1.a, 0.0)) + (max(c1_0.a, 0.0) * max(t1.b, 0.0)))), -1.0, 1.0);
// Stage 1
r0.rgb = clamp(vec3((max(c0_1.rgb, 0.0) * max(t0.rgb, 0.0))), -1.0, 1.0);
r1.rgb = clamp(vec3((max(c1_1.rgb, 0.0) * max(r0.aaa, 0.0))), -1.0, 1.0);
r0.a = clamp((((1.0 - clamp(t0.b, 0.0, 1.0)) * (1.0 - clamp(r0.a, 0.0, 1.0)))), -1.0, 1.0);
r1.a = clamp((((1.0 - clamp(c0_1.a, 0.0, 1.0)) * max(r0.a, 0.0))), -1.0, 1.0);
// Final Combiner
fragColor.rgb = max(vec4(max(r1.rgb, 0.0) * (1.0 - clamp(v0.rgb, 0.0, 1.0)), 0.0).rgb, 0.0) + mix(vec3(max(vec4(0.0).rgb, 0.0)), vec3((1.0 - clamp(r1.aaa, 0.0, 1.0))), vec3(max(r0.rgb, 0.0)));
fragColor.a = (1.0 - clamp(r0.a, 0.0, 1.0));
if (!(fragColor.a > alphaRef)) discard;
}
abaire commented 2 years ago

Multiplying t0.rgb by ~2 in the stage 0 combiner will cause the output to look at least much closer to correct.

The c0 constants are roughly 0.5 for the rgb components, so that's an interesting coincidence at least. Perhaps there's some gamma adjustment being made incorrectly? c0_0 0.5019608, 0.5019608, 0.5019608, 0.0078431 float4

abaire commented 2 years ago

Checked the behavior of AY8 and A8Y8 textures that are being used, they seem correct.

Looks like T1.y is always < 0 in the frame I'm looking at, and the border mode is set to clamp, but I don't see a clamp color being set for texture[1] in the frame. Update: Looks like the clamp is set to 0 very early on in the level:

nv2a: pgraph method (0): 0x97 -> 0x1b24 NV097_SET_TEXTURE_BORDER_COLOR (0x0)
nv2a: pgraph method (0): 0x97 -> 0x1b64 NV097_SET_TEXTURE_BORDER_COLOR+0x40 (0x0)
nv2a: pgraph method (0): 0x97 -> 0x1ba4 NV097_SET_TEXTURE_BORDER_COLOR+0x80 (0x0)
nv2a: pgraph method (0): 0x97 -> 0x1be4 NV097_SET_TEXTURE_BORDER_COLOR+0xc0 (0x0)

Also T0.y is never set by the vsh and is always 0.0 (which may be fine, since T0 is identical on every row, as compared to T1 whose alpha varies by Y)

abaire commented 2 years ago

Continuing to look at the actual data:

The T0 texture coordinates coming out of the vertex shader seem reasonable. vertices that are far away from the camera start approaching x=1.0 (y is never set but the texture is clamped so this should not matter). Vertices that are very close to the camera get small or even negative x's.

In the pixel shader, multiplying the T0 and T1 texcoords by 2 produces fog that looks substantially more correct, but presumably is saturating values incorrectly.

Multiplying c0 and c1 by 2 produces fog that is far too dense, likely a combination of the fact that c0_0 is already slightly higher than 0.5 and c1_0 is already 1 (which will overdrive the alpha component used in later blending).

I've built a test case for texture border behavior and as far as I can see there are no obvious issues there. Without digging into the game code itself, I'm not sure if c0_0 is based on some value that is different between the HW and xemu.

This is a screenshot of debugger output showing the texture config (ignore stages 2 and 3, they are probably actually disabled and are populated due to a bug in my debugger code).

texture_settings

abaire commented 2 years ago

And looking at the intro cutscene, where the geometry is much farther away from the camera, T0.x is already > 1.0 so the maximum fogging should already be applied. Much as with the other test, T1.y always seems to be < 0 and in this case T1.x is always far larger than 1.0 (~1100-1500 at a glance).

abaire commented 2 years ago

Taking a screenshot from the HW, I notice that all of the items that seem over-fogged are actually transparent.

Looking at those transparent objects, the colors seem pretty close (the frame is probably not a perfect match which may explain discrepancies)

HW: SilentCartographerIntroHW

xemu: SilentCartographerIntroxemu

abaire commented 2 years ago

Using nv2a trace I was also able to confirm that the combiner constants seem to be the same across xemu and HW.

abaire commented 2 years ago

Random find: Halo is setting the fog color to 0x9ea2cd which seems interesting since it never utilizes the fog in any combiner. On hardware it looks like it's explicitly done at the start of each frame, and explicitly set back to 0 before two final draw calls at the end of the frame.

abaire commented 2 years ago

Reimplementing what I think is the fogging pass via the nxdk actually does produce differing results between HW and xemu. Still need to dig into exactly why (the fog color doesn't seem to have an impact).

In my test case, HW produces pixels (48,67,62,255) ~= (0.1882, 0.2623, 0.2431) and xemu (126,146,163,255) ~= (0.6353, 0.5725, 0.6392)

Screenshots of the difference. Interestingly the components at each stage seem to match, it's just the final output that differs.


Hardware: hw


xemu: xemu


Final combiner: nv2a: pgraph method (0): NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_COMBINER_SPECULAR_FOG_CW0<0x288> (0xC3D000F {[A: R0Temp], [B: R1Temp Alpha Invert], [C: Zero], [D: EF_Prod]}) nv2a: pgraph method (0): NV20_KELVIN_PRIMITIVE<0x97> -> NV097_SET_COMBINER_SPECULAR_FOG_CW1<0x28C> (0xD243C00 {[E: R1Temp], [F: V0_Diffuse Invert], [G: R0Temp Alpha Invert]})

abaire commented 2 years ago

It looks like the issue is with the calculation of r1.a (the last box in these results)

hw hw

xemu xemu

abaire commented 2 years ago

And it looks like the issue is that HW parallelizes operations in a way that xemu is failing to do.

In this particular case, r1.a is calculated in the same combiner as r0.a:

r0.a = (1 - tex0.b) * (1 - r0.a)
r1.a = (1 - c0_1.a) * (r0.a)

In xemu these are executed sequentially, so the assignment to r0 affects the calculation of r1. In hardware they happen in parallel, so both are calculated independently based on the inputs to the stage.