spacehamster / DXDecompiler

Other
57 stars 16 forks source link

Decompiler Intermediate Representation Design #8

Open spacehamster opened 3 years ago

spacehamster commented 3 years ago

There are some design questions about the decompiler that need to be answered before progress can proceed. https://github.com/spacehamster/DXDecompiler/wiki/Decompiler-Intermediate-Representation-Design

sweetgiorni commented 3 years ago

Nice write-up. Have you considered using LLVM for the IR? It seems to be commonly used for decompiler backends. It's also used by DXIL (in the opposite direction, of course). LLVMSharp provides a wrapper around LLVM headers. We could make use of any of the built-in passes LLVM provides and write our own passes for the rest of the instruction lifting.

spacehamster commented 3 years ago

Yes, I have considered LLVM for the IR. I referred to DXIL as SM6, they are essentially the same thing. The problem with DXIL is that all instructions are scalarized (vectors operations are converted into multiple scalar operations, all swizzle information is lost) and all structured control flow (loops and if statements) are converted into unstructured control statements (jump statements), which I think is problematic.

LLVM is also a very heavy dependency. The LLVMSharp bindings work on the llvm c-api which is low level and difficult to work with. Llvm.NET has nicer bindings and is easier to use. Either way, because dxil is based on llvm 3.7, using newer versions of llvm may give incorrect results when using the DataLayout class, which may require building bindings to llvm 3.7 or DirectXShaderCompiler. It may also be an option to implement a small subset of LLVM in C# but that may be too much work to be feasible.

I don't know at this stage if those issues would rule it out or not, it is something I am still researching.

Also along a similar vain is using SPIR-V as an IR, which I have only a cursory understanding of, but it's format is similar to LLVM but it does support vector operations.

As an example, if you take the shader

SamplerState MeshTextureSampler;
Texture2D g_MeshTexture;
bool bTexture;

struct VS_OUTPUT
{
    float4 Position : SV_POSITION;
    float4 Diffuse : COLOR0;
    float2 TextureUV : TEXCOORD0;
};

struct PS_OUTPUT
{
    float4 RGBColor : SV_Target;
};

PS_OUTPUT RenderScenePS(VS_OUTPUT In)
{
    PS_OUTPUT Output;

    if (bTexture)
        Output.RGBColor = g_MeshTexture.Sample(MeshTextureSampler, In.TextureUV) * In.Diffuse;
    else
        Output.RGBColor = In.Diffuse;

    return Output;
}

and convert it to DXIL using dxbc2dxil you get this.

;
; Input signature:
;
; Name                 Index   Mask Register SysValue  Format   Used
; -------------------- ----- ------ -------- -------- ------- ------
; SV_POSITION              0   xyzw        0      POS   float       
; COLOR                    0   xyzw        1     NONE   float   xyzw
; TEXCOORD                 0   xy          2     NONE   float   xy  
;
;
; Output signature:
;
; Name                 Index   Mask Register SysValue  Format   Used
; -------------------- ----- ------ -------- -------- ------- ------
; SV_Target                0   xyzw        0   TARGET   float   xyzw
;
;
; Pipeline Runtime Information: 
;
; Pixel Shader
; DepthOutput=0
; SampleFrequency=0
;
;
; Input signature:
;
; Name                 Index             InterpMode DynIdx
; -------------------- ----- ---------------------- ------
; SV_Position              0                              
; COLOR                    0                 linear       
; TEXCOORD                 0                 linear       
;
; Output signature:
;
; Name                 Index             InterpMode DynIdx
; -------------------- ----- ---------------------- ------
; SV_Target                0                              
;
; Buffer Definitions:
;
; cbuffer CB0
; {
;
;   [320 x i8](type annotation not present)
;
; }
;
;
; Resource Bindings:
;
; Name                                 Type  Format         Dim      ID      HLSL Bind  Count
; ------------------------------ ---------- ------- ----------- ------- -------------- ------
; CB0                               cbuffer      NA          NA     CB0            cb0     1
; S0                                sampler      NA          NA      S0             s0     1
; T0                                texture     f32          2d      T0             t0     1
;

%dx.types.Handle = type { i8* }
%dx.types.CBufRet.i32 = type { i32, i32, i32, i32 }
%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
%dx.types.f32 = type { float }
%dx.types.i8x320 = type { [320 x i8] }
%dx.types.Sampler = type opaque

define void @main() {
entry:
  %0 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 0, i32 0, i32 0, i1 false)  ; CreateHandle(resourceClass,rangeId,index,nonUniformIndex)
  %1 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 2, i32 0, i32 0, i1 false)  ; CreateHandle(resourceClass,rangeId,index,nonUniformIndex)
  %2 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 3, i32 0, i32 0, i1 false)  ; CreateHandle(resourceClass,rangeId,index,nonUniformIndex)
  %3 = call %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32 59, %dx.types.Handle %1, i32 19)  ; CBufferLoadLegacy(handle,regIndex)
  %4 = extractvalue %dx.types.CBufRet.i32 %3, 1
  %5 = icmp ne i32 %4, 0
  br i1 %5, label %if0.then, label %if0.else

if0.then:                                         ; preds = %entry
  %6 = call float @dx.op.loadInput.f32(i32 4, i32 2, i32 0, i8 0, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %7 = call float @dx.op.loadInput.f32(i32 4, i32 2, i32 0, i8 1, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %8 = call %dx.types.ResRet.f32 @dx.op.sample.f32(i32 60, %dx.types.Handle %0, %dx.types.Handle %2, float %6, float %7, float undef, float undef, i32 0, i32 0, i32 undef, float 0.000000e+00)  ; Sample(srv,sampler,coord0,coord1,coord2,coord3,offset0,offset1,offset2,clamp)
  %9 = extractvalue %dx.types.ResRet.f32 %8, 0
  %10 = extractvalue %dx.types.ResRet.f32 %8, 1
  %11 = extractvalue %dx.types.ResRet.f32 %8, 2
  %12 = extractvalue %dx.types.ResRet.f32 %8, 3
  %13 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 0, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %14 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 1, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %15 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 2, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %16 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 3, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %17 = fmul fast float %9, %13
  %18 = fmul fast float %10, %14
  %19 = fmul fast float %11, %15
  %20 = fmul fast float %12, %16
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 0, float %17)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 1, float %18)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 2, float %19)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 3, float %20)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  br label %if0.end

if0.else:                                         ; preds = %entry
  %21 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 0, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %22 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 1, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %23 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 2, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  %24 = call float @dx.op.loadInput.f32(i32 4, i32 1, i32 0, i8 3, i32 undef)  ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 0, float %21)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 1, float %22)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 2, float %23)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 3, float %24)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  br label %if0.end

if0.end:                                          ; preds = %if0.else, %if0.then
  ret void
}

; Function Attrs: nounwind readonly
declare %dx.types.Handle @dx.op.createHandle(i32, i8, i32, i32, i1) #0

; Function Attrs: nounwind readonly
declare %dx.types.CBufRet.i32 @dx.op.cbufferLoadLegacy.i32(i32, %dx.types.Handle, i32) #0

; Function Attrs: nounwind readnone
declare float @dx.op.loadInput.f32(i32, i32, i32, i8, i32) #1

; Function Attrs: nounwind readonly
declare %dx.types.ResRet.f32 @dx.op.sample.f32(i32, %dx.types.Handle, %dx.types.Handle, float, float, float, float, i32, i32, i32, float) #0

; Function Attrs: nounwind
declare void @dx.op.tempRegStore.f32(i32, i32, float) #2

; Function Attrs: nounwind readonly
declare float @dx.op.tempRegLoad.f32(i32, i32) #0

; Function Attrs: nounwind
declare void @dx.op.storeOutput.f32(i32, i32, i32, i8, float) #2

attributes #0 = { nounwind readonly }
attributes #1 = { nounwind readnone }
attributes #2 = { nounwind }

!dx.version = !{!0}
!dx.valver = !{!0}
!dx.shaderModel = !{!1}
!dx.resources = !{!2}
!dx.entryPoints = !{!10}
!llvm.ident = !{!20}

!0 = !{i32 1, i32 0}
!1 = !{!"ps", i32 6, i32 0}
!2 = !{!3, null, !6, !8}
!3 = !{!4}
!4 = !{i32 0, %dx.types.f32 addrspace(1)* undef, !"T0", i32 0, i32 0, i32 1, i32 2, i32 0, !5}
!5 = !{i32 0, i32 9}
!6 = !{!7}
!7 = !{i32 0, %dx.types.i8x320 addrspace(2)* undef, !"CB0", i32 0, i32 0, i32 1, i32 320, null}
!8 = !{!9}
!9 = !{i32 0, %dx.types.Sampler addrspace(1)* undef, !"S0", i32 0, i32 0, i32 1, i32 0, null}
!10 = !{void ()* @main, !"main", !11, !2, !19}
!11 = !{!12, !17, null}
!12 = !{!13, !15, !16}
!13 = !{i32 0, !"SV_Position", i8 9, i8 3, !14, i8 0, i32 1, i8 4, i32 0, i8 0, null}
!14 = !{i32 0}
!15 = !{i32 1, !"COLOR", i8 9, i8 0, !14, i8 2, i32 1, i8 4, i32 1, i8 0, null}
!16 = !{i32 2, !"TEXCOORD", i8 9, i8 0, !14, i8 2, i32 1, i8 2, i32 2, i8 0, null}
!17 = !{!18}
!18 = !{i32 0, !"SV_Target", i8 9, i8 16, !14, i8 0, i32 1, i8 4, i32 0, i8 0, null}
!19 = !{i32 0, i64 256}
!20 = !{!"dxbc2dxil 1.2"}

And if you feed that to llvm-cbe, you get this (note that feeding it to a better llvm decompiler like ret-dec gives you much better results, but I think this is representative of the unprocessed format that would act as a starting point).

/* Provide Declarations */
#include <stdarg.h>
#include <setjmp.h>
#include <limits.h>
#include <stdint.h>
#include <math.h>
#ifndef __cplusplus
typedef unsigned char bool;
#endif

#ifndef _MSC_VER
#define __forceinline __attribute__((always_inline)) inline
#endif

#if defined(__GNUC__)
#define  __ATTRIBUTELIST__(x) __attribute__(x)
#else
#define  __ATTRIBUTELIST__(x)  
#endif

#ifdef _MSC_VER  /* Can only support "linkonce" vars with GCC */
#define __attribute__(X)
#endif

/* Global Declarations */

/* Types Declarations */
struct l_struct_dx_OC_types_OC_Handle;
struct l_struct_dx_OC_types_OC_CBufRet_OC_i32;
struct l_struct_dx_OC_types_OC_ResRet_OC_f32;

/* Function definitions */

/* Types Definitions */
struct l_struct_dx_OC_types_OC_Handle {
  uint8_t* field0;
};
struct l_struct_dx_OC_types_OC_CBufRet_OC_i32 {
  uint32_t field0;
  uint32_t field1;
  uint32_t field2;
  uint32_t field3;
};
struct l_struct_dx_OC_types_OC_ResRet_OC_f32 {
  float field0;
  float field1;
  float field2;
  float field3;
  uint32_t field4;
};

/* Function Declarations */
void main(void);
struct l_struct_dx_OC_types_OC_Handle dx_OC_op_OC_createHandle(uint32_t, uint8_t, uint32_t, uint32_t, bool) __ATTRIBUTELIST__((nothrow, pure));
struct l_struct_dx_OC_types_OC_CBufRet_OC_i32 dx_OC_op_OC_cbufferLoadLegacy_OC_i32(uint32_t, struct l_struct_dx_OC_types_OC_Handle, uint32_t) __ATTRIBUTELIST__((nothrow, pure));
float dx_OC_op_OC_loadInput_OC_f32(uint32_t, uint32_t, uint32_t, uint8_t, uint32_t) __ATTRIBUTELIST__((nothrow, const));
struct l_struct_dx_OC_types_OC_ResRet_OC_f32 dx_OC_op_OC_sample_OC_f32(uint32_t, struct l_struct_dx_OC_types_OC_Handle, struct l_struct_dx_OC_types_OC_Handle, float, float, float, float, uint32_t, uint32_t, uint32_t, float) __ATTRIBUTELIST__((nothrow, pure));
void dx_OC_op_OC_tempRegStore_OC_f32(uint32_t, uint32_t, float) __ATTRIBUTELIST__((nothrow));
float dx_OC_op_OC_tempRegLoad_OC_f32(uint32_t, uint32_t) __ATTRIBUTELIST__((nothrow, pure));
void dx_OC_op_OC_storeOutput_OC_f32(uint32_t, uint32_t, uint32_t, uint8_t, float) __ATTRIBUTELIST__((nothrow));

/* LLVM Intrinsic Builtin Function Bodies */
static __forceinline float llvm_fmul_f32(float a, float b) {
  float r = a * b;
  return r;
}

/* Function Bodies */

void main(void) {
  struct l_struct_dx_OC_types_OC_Handle llvm_cbe_tmp__1;
  struct l_struct_dx_OC_types_OC_Handle llvm_cbe_tmp__2;
  struct l_struct_dx_OC_types_OC_Handle llvm_cbe_tmp__3;
  struct l_struct_dx_OC_types_OC_CBufRet_OC_i32 llvm_cbe_tmp__4;
  float llvm_cbe_tmp__5;
  float llvm_cbe_tmp__6;
  struct l_struct_dx_OC_types_OC_ResRet_OC_f32 llvm_cbe_tmp__7;
  float llvm_cbe_tmp__8;
  float llvm_cbe_tmp__9;
  float llvm_cbe_tmp__10;
  float llvm_cbe_tmp__11;
  float llvm_cbe_tmp__12;
  float llvm_cbe_tmp__13;
  float llvm_cbe_tmp__14;
  float llvm_cbe_tmp__15;

  llvm_cbe_tmp__1 = dx_OC_op_OC_createHandle(57, 0, 0, 0, 0);
  llvm_cbe_tmp__2 = dx_OC_op_OC_createHandle(57, 2, 0, 0, 0);
  llvm_cbe_tmp__3 = dx_OC_op_OC_createHandle(57, 3, 0, 0, 0);
  llvm_cbe_tmp__4 = dx_OC_op_OC_cbufferLoadLegacy_OC_i32(59, llvm_cbe_tmp__2, 19);
  if ((((((llvm_cbe_tmp__4.field1)) != 0u)&1))) {
    goto llvm_cbe_if0_2e_then;
  } else {
    goto llvm_cbe_if0_2e_else;
  }

llvm_cbe_if0_2e_then:
  llvm_cbe_tmp__5 = dx_OC_op_OC_loadInput_OC_f32(4, 2, 0, 0, /*UNDEF*/0);
  llvm_cbe_tmp__6 = dx_OC_op_OC_loadInput_OC_f32(4, 2, 0, 1, /*UNDEF*/0);
  llvm_cbe_tmp__7 = dx_OC_op_OC_sample_OC_f32(60, llvm_cbe_tmp__1, llvm_cbe_tmp__3, llvm_cbe_tmp__5, llvm_cbe_tmp__6, /*UNDEF*/0, /*UNDEF*/0, 0, 0, /*UNDEF*/0, 0);
  llvm_cbe_tmp__8 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 0, /*UNDEF*/0);
  llvm_cbe_tmp__9 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 1, /*UNDEF*/0);
  llvm_cbe_tmp__10 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 2, /*UNDEF*/0);
  llvm_cbe_tmp__11 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 3, /*UNDEF*/0);
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 0, (llvm_fmul_f32(((llvm_cbe_tmp__7.field0)), llvm_cbe_tmp__8)));
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 1, (llvm_fmul_f32(((llvm_cbe_tmp__7.field1)), llvm_cbe_tmp__9)));
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 2, (llvm_fmul_f32(((llvm_cbe_tmp__7.field2)), llvm_cbe_tmp__10)));
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 3, (llvm_fmul_f32(((llvm_cbe_tmp__7.field3)), llvm_cbe_tmp__11)));
  goto llvm_cbe_if0_2e_end;

llvm_cbe_if0_2e_else:
  llvm_cbe_tmp__12 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 0, /*UNDEF*/0);
  llvm_cbe_tmp__13 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 1, /*UNDEF*/0);
  llvm_cbe_tmp__14 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 2, /*UNDEF*/0);
  llvm_cbe_tmp__15 = dx_OC_op_OC_loadInput_OC_f32(4, 1, 0, 3, /*UNDEF*/0);
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 0, llvm_cbe_tmp__12);
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 1, llvm_cbe_tmp__13);
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 2, llvm_cbe_tmp__14);
  dx_OC_op_OC_storeOutput_OC_f32(5, 0, 0, 3, llvm_cbe_tmp__15);
  goto llvm_cbe_if0_2e_end;

llvm_cbe_if0_2e_end:
  return;
}

Just for completeness, when the dxil is passed into ret-dec, this is what comes out.

//
// This file was generated by the Retargetable Decompiler
// Website: https://retdec.com
// Copyright (c) Retargetable Decompiler <info@retdec.com>
//

#include <stdbool.h>
#include <stdint.h>

// ----------------- Float Types Definitions ------------------

typedef float float32_t;

// ------------------------ Structures ------------------------

struct dx_types_CBufRet_i32 {
    int32_t e0;
    int32_t e1;
    int32_t e2;
    int32_t e3;
};

struct dx_types_Handle {
    char * e0;
};

struct dx_types_ResRet_f32 {
    float32_t e0;
    float32_t e1;
    float32_t e2;
    float32_t e3;
    int32_t e4;
};

// ------------------------ Functions -------------------------

int main() {
    struct dx_types_Handle v1;
    struct dx_types_Handle v2;
    struct dx_types_Handle v3;
    struct dx_types_ResRet_f32 v4;
    v1 = dx_op_createHandle(57, 0, 0, 0, false);
    v2 = dx_op_createHandle(57, 2, 0, 0, false);
    v3 = dx_op_createHandle(57, 3, 0, 0, false);
    if (dx_op_cbufferLoadLegacy_i32(59, v2, 19).e1 != 0) {
        float32_t v5 = dx_op_loadInput_f32(4, 2, 0, 0, 0);
        v4 = dx_op_sample_f32(60, v1, v3, v5, dx_op_loadInput_f32(4, 2, 0, 1, 0), 0.0f, 0.0f, 0, 0, 0, 0.0f);
        float32_t v6 = dx_op_loadInput_f32(4, 1, 0, 0, 0);
        float32_t v7 = dx_op_loadInput_f32(4, 1, 0, 1, 0);
        float32_t v8 = dx_op_loadInput_f32(4, 1, 0, 2, 0);
        float32_t v9 = dx_op_loadInput_f32(4, 1, 0, 3, 0);
        dx_op_storeOutput_f32(5, 0, 0, 0, v4.e0 * v6);
        dx_op_storeOutput_f32(5, 0, 0, 1, v4.e1 * v7);
        dx_op_storeOutput_f32(5, 0, 0, 2, v4.e2 * v8);
        dx_op_storeOutput_f32(5, 0, 0, 3, v4.e3 * v9);
    } else {
        float32_t v10 = dx_op_loadInput_f32(4, 1, 0, 0, 0);
        float32_t v11 = dx_op_loadInput_f32(4, 1, 0, 1, 0);
        float32_t v12 = dx_op_loadInput_f32(4, 1, 0, 2, 0);
        float32_t v13 = dx_op_loadInput_f32(4, 1, 0, 3, 0);
        dx_op_storeOutput_f32(5, 0, 0, 0, v10);
        dx_op_storeOutput_f32(5, 0, 0, 1, v11);
        dx_op_storeOutput_f32(5, 0, 0, 2, v12);
        dx_op_storeOutput_f32(5, 0, 0, 3, v13);
    }
}

// --------------------- Meta-Information ---------------------

// Detected functions: 1
sweetgiorni commented 3 years ago

Yes, DXIL itself would not make a good IR for the purposes of DXDecompiler, but I think LLVM IR is capable of handling the constructs you mention. From the DXIL docs:

Prior to being converted into the low-level DXIL IR, a higher level IR is generated by codegen which is then transformed into DXIL by the optimizer. This lowers high-level constructs, such as user-defined types, multi-dimensional arrays, matrices, and vectors into simpler abstractions more suitable for fast JIT-ing in the driver compilers. DXIL is derived from LLVM IR.

So if LLVM is appealing, a higher-level IR above DXIL might be a good starting point. Any SM6 decompilation functionality in DXDecompiler would be responsible for lifting those scalarized operations into vectors and matrices as necessary.

SPIR-V does look promising. One downside I see is there don't seem to be any native C# libraries implementing the SPIR-V spec or a parser, so you may end up writing a good chunk of code for that depending on the complexity of the spec.

spacehamster commented 3 years ago

Going straight to high level LLVM IR would mean that we could not make use of the dxbc2dxil utility. DirectXShaderCompiler is able to output it's high level IR by using the -fgcl flag. Examining the high level IR shows that instructions operate on vectors, so I think the idea is possible, but i'm unsure if it is the best way forward. I suspect that LLVM would require a lot of effort to work with and would still be too low level and would not properly capture the semantics of HLSL.

The previous shader when compiled with dxc using -fgcl looks like this.

;
; Buffer Definitions:
;
; cbuffer $Globals
; {
;
;   struct $Globals
;   {
;
;       bool bTexture;                                ; Offset:    0
;   
;   } $Globals;                                       ; Offset:    0 Size:     4
;
; }
;
;
; Resource Bindings:
;
; Name                                 Type  Format         Dim      ID      HLSL Bind  Count
; ------------------------------ ---------- ------- ----------- ------- -------------- ------
; $Globals                          cbuffer      NA          NA     CB0   cb4294967295     1
; MeshTextureSampler                sampler      NA          NA      S0s4294967295,space4294967295     1
; g_MeshTexture                     texture     f32          2d      T0t4294967295,space4294967295     1
;
target datalayout = "e-m:e-p:32:32-i1:32-i8:32-i16:32-i32:32-i64:64-f16:32-f32:32-f64:64-n8:16:32:64"
target triple = "dxil-ms-dx"

%struct.SamplerState = type { i32 }
%"class.Texture2D<vector<float, 4> >" = type { <4 x float>, %"class.Texture2D<vector<float, 4> >::mips_type" }
%"class.Texture2D<vector<float, 4> >::mips_type" = type { i32 }
%"$Globals" = type { i32 }
%struct.PS_OUTPUT = type { <4 x float> }
%struct.VS_OUTPUT = type { <4 x float>, <4 x float>, <2 x float> }
%dx.types.Handle = type { i8* }
%dx.types.ResourceProperties = type { i32, i32 }

@"\01?MeshTextureSampler@@3USamplerState@@A" = external global %struct.SamplerState, align 4
@"\01?g_MeshTexture@@3V?$Texture2D@V?$vector@M$03@@@@A" = external global %"class.Texture2D<vector<float, 4> >", align 4
@"\01?bTexture@@3_NB" = external constant i32, align 4
@"$Globals" = external constant %"$Globals"

; Function Attrs: nounwind
define void @RenderScenePS(%struct.PS_OUTPUT* noalias sret %agg.result, %struct.VS_OUTPUT* %In) #0 {
  %1 = call %dx.types.Handle @"dx.hl.createhandle..%dx.types.Handle (i32, %\22$Globals\22*, i32)"(i32 0, %"$Globals"* @"$Globals", i32 0)
  %2 = call %dx.types.Handle @"dx.hl.annotatehandle..%dx.types.Handle (i32, %dx.types.Handle, %dx.types.ResourceProperties, %\22$Globals\22)"(i32 11, %dx.types.Handle %1, %dx.types.ResourceProperties { i32 13, i32 4 }, %"$Globals" undef)
  %3 = call %"$Globals"* @"dx.hl.subscript.cb.%\22$Globals\22* (i32, %dx.types.Handle, i32)"(i32 6, %dx.types.Handle %2, i32 0)
  %4 = getelementptr inbounds %"$Globals", %"$Globals"* %3, i32 0, i32 0
  %5 = load i32, i32* %4, align 4, !tbaa !36, !range !40
  %6 = icmp ne i32 %5, 0
  br i1 %6, label %7, label %21

; <label>:7                                       ; preds = %0
  %8 = getelementptr inbounds %struct.VS_OUTPUT, %struct.VS_OUTPUT* %In, i32 0, i32 2
  %9 = load <2 x float>, <2 x float>* %8, align 4, !tbaa !41
  %10 = load %"class.Texture2D<vector<float, 4> >", %"class.Texture2D<vector<float, 4> >"* @"\01?g_MeshTexture@@3V?$Texture2D@V?$vector@M$03@@@@A"
  %11 = call %dx.types.Handle @"dx.hl.createhandle..%dx.types.Handle (i32, %\22class.Texture2D<vector<float, 4> >\22)"(i32 0, %"class.Texture2D<vector<float, 4> >" %10)
  %12 = call %dx.types.Handle @"dx.hl.annotatehandle..%dx.types.Handle (i32, %dx.types.Handle, %dx.types.ResourceProperties, %\22class.Texture2D<vector<float, 4> >\22)"(i32 11, %dx.types.Handle %11, %dx.types.ResourceProperties { i32 2, i32 1033 }, %"class.Texture2D<vector<float, 4> >" undef)
  %13 = load %struct.SamplerState, %struct.SamplerState* @"\01?MeshTextureSampler@@3USamplerState@@A"
  %14 = call %dx.types.Handle @"dx.hl.createhandle..%dx.types.Handle (i32, %struct.SamplerState)"(i32 0, %struct.SamplerState %13)
  %15 = call %dx.types.Handle @"dx.hl.annotatehandle..%dx.types.Handle (i32, %dx.types.Handle, %dx.types.ResourceProperties, %struct.SamplerState)"(i32 11, %dx.types.Handle %14, %dx.types.ResourceProperties { i32 14, i32 0 }, %struct.SamplerState undef)
  %16 = call <4 x float> @"dx.hl.op..<4 x float> (i32, %dx.types.Handle, %dx.types.Handle, <2 x float>)"(i32 220, %dx.types.Handle %12, %dx.types.Handle %15, <2 x float> %9)
  %17 = getelementptr inbounds %struct.VS_OUTPUT, %struct.VS_OUTPUT* %In, i32 0, i32 1
  %18 = load <4 x float>, <4 x float>* %17, align 4, !tbaa !41
  %19 = fmul <4 x float> %16, %18
  %20 = getelementptr inbounds %struct.PS_OUTPUT, %struct.PS_OUTPUT* %agg.result, i32 0, i32 0
  store <4 x float> %19, <4 x float>* %20, align 4, !tbaa !41
  br label %25

; <label>:21                                      ; preds = %0
  %22 = getelementptr inbounds %struct.VS_OUTPUT, %struct.VS_OUTPUT* %In, i32 0, i32 1
  %23 = load <4 x float>, <4 x float>* %22, align 4, !tbaa !41
  %24 = getelementptr inbounds %struct.PS_OUTPUT, %struct.PS_OUTPUT* %agg.result, i32 0, i32 0
  store <4 x float> %23, <4 x float>* %24, align 4, !tbaa !41
  br label %25

; <label>:25                                      ; preds = %21, %7
  ret void
}

declare <4 x float> @"dx.hl.op..<4 x float> (i32, %dx.types.Handle, %dx.types.Handle, <2 x float>)"(i32, %dx.types.Handle, %dx.types.Handle, <2 x float>)

; Function Attrs: nounwind readnone
declare %dx.types.Handle @"dx.hl.createhandle..%dx.types.Handle (i32, %\22class.Texture2D<vector<float, 4> >\22)"(i32, %"class.Texture2D<vector<float, 4> >") #1

; Function Attrs: nounwind readnone
declare %dx.types.Handle @"dx.hl.annotatehandle..%dx.types.Handle (i32, %dx.types.Handle, %dx.types.ResourceProperties, %\22class.Texture2D<vector<float, 4> >\22)"(i32, %dx.types.Handle, %dx.types.ResourceProperties, %"class.Texture2D<vector<float, 4> >") #1

; Function Attrs: nounwind readnone
declare %dx.types.Handle @"dx.hl.createhandle..%dx.types.Handle (i32, %struct.SamplerState)"(i32, %struct.SamplerState) #1

; Function Attrs: nounwind readnone
declare %dx.types.Handle @"dx.hl.annotatehandle..%dx.types.Handle (i32, %dx.types.Handle, %dx.types.ResourceProperties, %struct.SamplerState)"(i32, %dx.types.Handle, %dx.types.ResourceProperties, %struct.SamplerState) #1

; Function Attrs: nounwind readnone
declare %"$Globals"* @"dx.hl.subscript.cb.%\22$Globals\22* (i32, %dx.types.Handle, i32)"(i32, %dx.types.Handle, i32) #1

; Function Attrs: nounwind readnone
declare %dx.types.Handle @"dx.hl.createhandle..%dx.types.Handle (i32, %\22$Globals\22*, i32)"(i32, %"$Globals"*, i32) #1

; Function Attrs: nounwind readnone
declare %dx.types.Handle @"dx.hl.annotatehandle..%dx.types.Handle (i32, %dx.types.Handle, %dx.types.ResourceProperties, %\22$Globals\22)"(i32, %dx.types.Handle, %dx.types.ResourceProperties, %"$Globals") #1

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="0" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone }

!pauseresume = !{!0}
!llvm.ident = !{!1}
!dx.version = !{!2}
!dx.valver = !{!3}
!dx.shaderModel = !{!4}
!dx.typeAnnotations = !{!5, !19}
!dx.entryPoints = !{!24}
!dx.fnprops = !{!33}
!dx.options = !{!34, !35}

!0 = !{!"hlsl-hlemit", !"hlsl-hlensure"}
!1 = !{!"clang version 3.7 (tags/RELEASE_370/final)"}
!2 = !{i32 1, i32 0}
!3 = !{i32 1, i32 5}
!4 = !{!"ps", i32 6, i32 0}
!5 = !{i32 0, %"class.Texture2D<vector<float, 4> >" undef, !6, %"class.Texture2D<vector<float, 4> >::mips_type" undef, !9, %struct.PS_OUTPUT undef, !11, %struct.VS_OUTPUT undef, !13, %"$Globals" undef, !17}
!6 = !{i32 20, !7, !8}
!7 = !{i32 6, !"h", i32 3, i32 0, i32 7, i32 9}
!8 = !{i32 6, !"mips", i32 3, i32 16}
!9 = !{i32 4, !10}
!10 = !{i32 6, !"handle", i32 3, i32 0, i32 7, i32 5}
!11 = !{i32 16, !12}
!12 = !{i32 6, !"RGBColor", i32 3, i32 0, i32 4, !"SV_Target", i32 7, i32 9}
!13 = !{i32 40, !14, !15, !16}
!14 = !{i32 6, !"Position", i32 3, i32 0, i32 4, !"SV_POSITION", i32 7, i32 9}
!15 = !{i32 6, !"Diffuse", i32 3, i32 16, i32 4, !"COLOR0", i32 7, i32 9}
!16 = !{i32 6, !"TextureUV", i32 3, i32 32, i32 4, !"TEXCOORD0", i32 7, i32 9}
!17 = !{i32 4, !18}
!18 = !{i32 6, !"bTexture", i32 3, i32 0, i32 7, i32 1}
!19 = !{i32 1, void (%struct.PS_OUTPUT*, %struct.VS_OUTPUT*)* @RenderScenePS, !20}
!20 = !{!21, !23, !21}
!21 = !{i32 0, !22, !22}
!22 = !{}
!23 = !{i32 1, !22, !22}
!24 = !{void (%struct.PS_OUTPUT*, %struct.VS_OUTPUT*)* @RenderScenePS, !"RenderScenePS", null, !25, null}
!25 = !{!26, null, !29, !31}
!26 = !{!27}
!27 = !{i32 0, %"class.Texture2D<vector<float, 4> >"* @"\01?g_MeshTexture@@3V?$Texture2D@V?$vector@M$03@@@@A", !"g_MeshTexture", i32 -1, i32 -1, i32 1, i32 2, i32 0, !28}
!28 = !{i32 0, i32 9}
!29 = !{!30}
!30 = !{i32 0, %"$Globals"* @"$Globals", !"$Globals", i32 0, i32 -1, i32 1, i32 4, null}
!31 = !{!32}
!32 = !{i32 0, %struct.SamplerState* @"\01?MeshTextureSampler@@3USamplerState@@A", !"MeshTextureSampler", i32 -1, i32 -1, i32 1, i32 0, null}
!33 = !{void (%struct.PS_OUTPUT*, %struct.VS_OUTPUT*)* @RenderScenePS, i32 0, i1 false}
!34 = !{i32 144}
!35 = !{i32 -1}
!36 = !{!37, !37, i64 0}
!37 = !{!"bool", !38, i64 0}
!38 = !{!"omnipotent char", !39, i64 0}
!39 = !{!"Simple C/C++ TBAA"}
!40 = !{i32 0, i32 2}
!41 = !{!38, !38, i64 0}
sweetgiorni commented 3 years ago

Yeah that isn't very pretty. SPIR-V seems like a good option for a higher level representation. Looks like there is already a C# header for SPIR-V, which makes things easier. Plus using a SPIR-V IR makes DXDecompiler more like a general purpose decompiler, not just limited to DX.

I may be revealing my lack of depth on the subject here, but instead of having different parsers for DXBC and DXIL, would it be possible to just run all DXBC shaders through dxbc2dxil.exe/dll? Then you'd only need to handle mapping DXIL to SPIR-V. In fact, if I'm interpreting it correctly, it seems there's already a C++ library that does that and exposes a C API that could be called from C# (Or anything else? Not sure if C# is mandatory if not using SlimShader and the like). If it's possible, that would reduce the main workload to decompiling SPIR-V to HLSL or GLSL. And since there's already a SPIR-V decompiler, it seems all the pieces of the puzzle are there already... I feel like I must be misunderstanding something; it's too good to be true :)

spacehamster commented 3 years ago

Using the previous shader as an example, dxil-spirv gives the error [ERROR]: Failed to parse blob. when trying to convert a shader that has been processed with dxbc2dxil. It does successfully convert a shader that was compiled with dxc though.

Passing the dxc compiled version though dxil-spriv and SPIR-V Cross gives this output

cbuffer _13_15 : register(b1, space0)
{
    float4 _15_m0[1] : packoffset(c0);
};

Texture2D<float4> _8 : register(t0, space0);
SamplerState _18 : register(s0, space0);

static float3 NORMAL;
static float2 TEXCOORD;
static float4 SV_Target;

struct SPIRV_Cross_Input
{
    float3 NORMAL : TEXCOORD0;
    float2 TEXCOORD : TEXCOORD1;
};

struct SPIRV_Cross_Output
{
    float4 SV_Target : SV_Target0;
};

void frag_main()
{
    float4 _49 = _8.Sample(_18, float2(TEXCOORD.x, TEXCOORD.y), int2(0, 0));
    float _63 = dot(float3(_15_m0[0u].xyz), float3(NORMAL.x, NORMAL.y, NORMAL.z));
    float _83 = isnan(0.0f) ? _63 : (isnan(_63) ? 0.0f : max(_63, 0.0f));
    float _68 = isnan(1.0f) ? _83 : (isnan(_83) ? 1.0f : min(_83, 1.0f));
    float _70 = isnan(_15_m0[0u].w) ? _68 : (isnan(_68) ? _15_m0[0u].w : max(_68, _15_m0[0u].w));
    SV_Target.x = _70 * _49.x;
    SV_Target.y = _70 * _49.y;
    SV_Target.z = _70 * _49.z;
    SV_Target.w = _70 * _49.w;
}

SPIRV_Cross_Output main(SPIRV_Cross_Input stage_input)
{
    NORMAL = stage_input.NORMAL;
    TEXCOORD = stage_input.TEXCOORD;
    frag_main();
    SPIRV_Cross_Output stage_output;
    stage_output.SV_Target = SV_Target;
    return stage_output;
}

It is very verbose, non-idiomatic and a bunch of information has been lost in the process.

I ran the process on a bunch of the DirectXShader test shaders and 388/1653 tests failed to decompile, so its not completely reliable either. I think it would be a good avenue if readability wasn't a goal.

My concern with spir-v is the similiar to my concern with llvm, it'd be hard to work with and too low level to capture the semantics. I want to examine how some other decompilers work like java/C# bytecode decompilers, but I have not made much progress with this.

C# is not mandatory, but there would need to be a strong reason to switch.

spacehamster commented 3 years ago

I'd like to also mention that currently, designs similar to HLSLcc and Direct3D Shader Crosscompiler appear to be the most promising avenue, as they are able to perform type reconstruction without a scalaring or destructing the control flow.