xenia-project / xenia

Xbox 360 Emulator Research Project
https://xenia.jp
Other
8.17k stars 1.12k forks source link

DXBC shader translator uses uninitialized registers #1511

Open Triang3l opened 4 years ago

Triang3l commented 4 years ago

The DXBC shader translator currently doesn't care about dependency chains of D3D10_SB_OPERAND_TYPE_TEMP registers (individual scalars of them, as float4 is just syntactic sugar in DXBC on modern GPU architectures), which causes CreateGraphicsPipelineState crashes on AMD drivers due to uninitialized registers and creates false dependencies potentially adding unneeded constraints to the driver's optimization passes. This is especially harmful when registers are initialized/renamed on one branch but not another.

Crashes caused by uninitialized registers constantly (but not exclusively) happen in the ROV output code, which is pretty complicated as it has many branches for different render target configurations, subroutine calls and is split into pre-guest-shader (early depth/stencil discard) and post-guest-shader parts.

For all components masked out in the output operand, the FXC shader compiler replaces the swizzle in the input component with the swizzle of the first used component — for YZ output mask, for example, YYZY should be used rather than XYZW. In our own code where the output mask is explicit, the correct swizzle may be hardcoded, though care should be taken where the output mask is a variable — maybe an inline function cleaning up the swizzle according to the write mask. It may be worth looking into a cleaner abstraction layer over DXBC (like Xbyak) that would handle swizzle cleanup internally. Currently emitting a single DXBC instruction also requires writing a dozen of lines of code, I still don't know, however, what syntax we should use to make it shorter overall. This is, however, possibly not as important to the compiler as unused components may be eliminated early — what is a lot more important is to make sure registers are not initialized conditionally.

Guest registers must be zero-initialized in the beginning of the shader to handle cases where a guest branch doesn't initialize a register. This is what Direct3D 9on12 does — it won't affect performance in cases without branching because code that nothing depends on gets optimized out by the driver's shader compiler.

Triang3l commented 4 years ago

All code emitting non-dcl instructions has been moved to the new DxbcOp*/DxbcDest/DxbcSrc layer that filters out swizzle of unused components. However, on AMD, the emulator still crashes with ROV. Likely should move away from subroutine calls, since they are probably very complex for both DXILConv and the compiler to analyze, to everything inlined.