tfetch LOD selection needs cleanup

Triang3l commented 4 years ago

The DXBC shader translator and the D3D12 texture cache need to be slightly reworked (likely as a part of the whole ENCODE_D3D10_SB → DxbcOp rewrite).

http://web.archive.org/web/20090514012026/http://msdn.microsoft.com/en-us/library/bb313957.aspx Two important things to take into account:

To use the tfetch2D (xvs_3_0, xps_3_0) instruction in a vertex shader, UseComputedLOD must be false unless you manually set the gradients and set UseRegisterGradients to true.
The total LOD for a sample is additive and is based on what is enabled. The total LOD is determined by the LOD computed in the texture pipeline (if UseComputedLOD is true), the LOD set by setTexLOD (if UseRegisterLOD is true), and the LODBias value.

"Unless" in the first apparently implies that for UseRegisterGradients to work, UseComputedLOD is required too — the LOD is computed from gradients, but they can be either implicit or explicit. If UseComputedLOD is false, we should assume that the computed value is 0.

The second statement means that the parameters are not mutually exclusive — we should sum all the biases and also the computed LOD.

There are multiple ways of how we can pass the bias in D3D12 (a good reference for LOD calculation and how DXBC sample instructions work internally is https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm):

Sampler state:
- Pro: Automatically applied to all kinds of fetches — Sample, SampleBias, SampleGrad, SampleLevel, CalculateLevelOfDetail.
- Con: Only constant values are supported.
- Con: Samplers are a limited resource — on older Nvidia GPUs, you can have only up to 16 samplers bound to a shader stage, while a shader can have an arbitrary number of tfetch instructions with different constant biases. While for filtering modes we have no other options, it's better not to allow the possibility of such a combinatorial explosion.
- Con, but very minor (and no information about clamping behavior on a real console as well): The bias from UseRegisterLOD that will be passed to SampleBias will need to be clamped to -16.0…15.99 (the D3D12 LOD bias range) independently from the bias in the sampler — 17 in RegisterLOD and -4 in the sampler will be 11.99, not 13.
Adding bias directly via SampleBias or to the argument of SampleLevel:
- Pro: Arbitrary values can be passed from the shader — suitable for register LOD, constant bias from the instruction and constant bias from the fetch constant.
- Con: Doesn't work with SampleGrad.
Scaling gradients by exp2(bias):
- Arbitrary values can be passed from the shader as well.
- Acceptable for cases when computed LOD is used — the only way to bias by a variable in SampleGrad.
- Not exactly sure how accurate that is, but at least for whole bias values it should probably be fine (double texel density — need the next mip).

It looks like there are two ways for us to cover the whole set of options:

Fetch constant bias in sampler state — free as we already have at least one sampler per fetch constant. SampleGrad will take scaled gradients only for instruction and register bias, other cases will be handled the same way as usually in PC games. Won't need to extract the bias from the fetch constant, though that's just 3 instructions so not that important, but will be able to use the full set of instructions (though not needed that much).
Fetch constant bias applied in shaders — a few additional instructions, but all three sources of LOD bias are handled the same way. Only SampleBias, SampleGrad and SampleLevel, no Sample. May be slightly simpler to handle (less cases to cover).

Triang3l commented 4 years ago

Anisotropic filtering handling needs some work too — need to check if it should work at all when the filtering mode is not linear/linear/linear (whether anisotropic filtering overrides those or requires those) in guest fetch constants, and should be disabled for UseComputedLOD=false. Viva Piñata has broken vertices in one of the draw call (looks like a terrain patch — also has something in r0.yz, but that's a different story), and it involves a R8G8 texture with components apparently representing the low (high-frequency) and the high (low-frequency) parts of a single packed number. It's a vertex texture, without gradients (so no anisotropic filtering), and tfetch explicitly has point filtering, however for some reason the fetch constant has anisotropic filtering enabled, resulting in something totally broken.

Triang3l commented 4 years ago

Call of Duty 4 alpha uses anisotropic filtering with nearest-neighbor mip filtering (but linear magnification/minification filters), the way it works in general is really weird, it seems. Possibly should take precedence (assume linear/linear/linear — not sure how it should behave for basemap mip filtering though) when gradients are provided, but totally be ignored when they are not?

Triang3l commented 4 years ago

Mostly resolved in https://github.com/xenia-project/xenia/commit/8a64861ec08f0762fc17230efdda4a0c2bd59b9b.

However, there are still many things missing from our texture fetch implementation, but they would overcomplicate it while unlikely ever purposely used in games. Mainly things involving the LOD value:

getWeights: W of the return value should contain the LOD lerp factor. XYZ should contain the interpolation factors actually used for interpolation at the needed LOD (they don't make sense at non-zero LODs otherwise). We don't have info which LOD they should be calculated on specifically, but I'd expect it to be the higher-resolution one. However, games mostly use getWeights for things like shadow map percentage-closer filtering, where there are no mips. All LOD biases, and register gradients if requested, need to be taken into account.
tfetch offsets need to be applied at the LOD the texture is sampled from. Not sure if the value is scaled for the higher-resolution LOD, or for both LODs independently, Direct3D 10 Sample documentation says "use an offset only at an integer miplevel; otherwise, you may get results that do not translate well to hardware", but anyway, using offsets calculated for LOD 0 at non-zero LODs doesn't make much sense. But offsets are mostly used without mips anyway (for shadow maps, blur).

That would require calculating the LOD somehow. CalculateLevelOfDetail may partially fulfill our needs, but it's available only in pixel shaders, and works only with implicit gradients (though explicit gradients can possibly be used by exploiting the parity of SV_Position). ALU LOD calculation is also possible, though may give different values than fixed-function texture sampling on the host probably, and would also require a lot of code, especially for anisotropic filtering. Another option (but memory- and bandwidth-consuming, though maybe tiled resource aliasing can be exploited) would be to use textures with the needed size filled with solid color with LOD values (something similar would also be needed for getBCF, but forcing zero would be a bit easier, everything could be mapped to a zero tile), though texture filtering precision needs to be taken into account here.

But for now, let's consider this solved until we find games requiring a more accurate implementation.

xenia-project / xenia

tfetch LOD selection needs cleanup #1563