Support HLSL SM6.6 Pack/Unpack intrinics

danginsburg commented 7 months ago

This bug covers a few different requests/issues:

Request to implement HLSL SM 6.6 Pack/Unpack intrinics, the syntax is not currently supported (https://microsoft.github.io/DirectX-Specs/d3d/HLSL_SM_6_6_Pack_Unpack_Intrinsics.html)
The existing slang bit_cast functionality does not generate valid GLSL for the SPIR-V glslang backend
The existing slang bit_cast functionality does not generate valid SPIR-V for the SPIR-V direct backend

So I hope the outcome would be adding the HLSL syntax and fixing code generation bugs. For example for #2/#3:

cbuffer Dimensions_t
{
    uint nWidth;
    uint nHeight;
};

StructuredBuffer<BufType> Buffer0;
StructuredBuffer<BufType> Buffer1;
Buffer<float4> Buffer2;
RWTexture2D<float4> OutputTexture ;

[numthreads(32, 32, 1)]
void MainCs( uint3 nThreadId : SV_DispatchThreadID )
{
    uint8_t4 v = bit_cast<uint8_t4>( nWidth ); 
    uint16_t4 v2 = bit_cast<uint16_t4>( nWidth );
    float4 vTmp = Buffer0[nThreadId.y * uint( v2.x ) + nThreadId.x].vData + Buffer0[nThreadId.y * nWidth + nThreadId.x].vMoreData;
    OutputTexture[nThreadId.xy] = vTmp; 
}

In the glslang backend it generates:

glslang:  cstest.vfx(61): error :  'nThreadId_0' : undeclared identifier
glslang:  cstest.vfx(61): error :  'y' : vector swizzle selection out of range
glslang:  cstest.vfx(61): error :  '=' :  cannot convert from ' temp float' to ' temp highp uint'
glslang:  cstest.vfx(61): error :  '' : compilation terminated
glslang: note : ERROR: 4 compilation errors.  No code generated.

In the direct SPIR-V backend it generates code, but it fails spirv-val:

error: line 63: Expected input to have the same total bit width as Result Type: Bitcast
  %6904 = OpBitcast %ushort %6482

Full spir-v:

; SPIR-V
; Version: 1.5
; Generator: Khronos; 40
; Bound: 23298
; Schema: 0
               OpCapability Int16
               OpCapability SampledBuffer
               OpCapability StorageImageReadWithoutFormat
               OpCapability StorageImageWriteWithoutFormat
               OpCapability Shader
               OpExtension "SPV_KHR_storage_buffer_storage_class"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %3956 "MainCs" %3913 %3852 %3853 %3854 %3407 %gl_GlobalInvocationID
               OpExecutionMode %3956 LocalSize 32 32 1
               OpMemberDecorate %_struct_1080 0 Offset 0
               OpMemberDecorate %_struct_1080 1 Offset 16
               OpDecorate %_struct_981 Block
               OpMemberDecorate %_struct_981 0 Offset 0
               OpDecorate %3913 Binding 0
               OpDecorate %3913 DescriptorSet 0
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %_runtimearr__struct_1080 ArrayStride 32
               OpDecorate %_struct_1075 Block
               OpMemberDecorate %_struct_1075 0 Offset 0
               OpDecorate %3852 Binding 30
               OpDecorate %3852 DescriptorSet 0
               OpDecorate %3853 Binding 31
               OpDecorate %3853 DescriptorSet 0
               OpDecorate %3854 Binding 32
               OpDecorate %3854 DescriptorSet 0
               OpDecorate %3407 Binding 158
               OpDecorate %3407 DescriptorSet 0
       %void = OpTypeVoid
       %1282 = OpTypeFunction %void
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_struct_1080 = OpTypeStruct %v4float %v4float
       %uint = OpTypeInt 32 0
%_struct_981 = OpTypeStruct %uint
%_ptr_Uniform__struct_981 = OpTypePointer Uniform %_struct_981
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_Uniform_uint = OpTypePointer Uniform %uint
%uint_1048575 = OpConstant %uint 1048575
     %ushort = OpTypeInt 16 0
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%_ptr_StorageBuffer__struct_1080 = OpTypePointer StorageBuffer %_struct_1080
%_runtimearr__struct_1080 = OpTypeRuntimeArray %_struct_1080
%_struct_1075 = OpTypeStruct %_runtimearr__struct_1080
%_ptr_StorageBuffer__struct_1075 = OpTypePointer StorageBuffer %_struct_1075
        %410 = OpTypeImage %float Buffer 2 0 0 1 Unknown
%_ptr_UniformConstant_410 = OpTypePointer UniformConstant %410
     %v2uint = OpTypeVector %uint 2
        %422 = OpTypeImage %float 2D 2 0 0 2 Unknown
%_ptr_UniformConstant_422 = OpTypePointer UniformConstant %422
       %3913 = OpVariable %_ptr_Uniform__struct_981 Uniform
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
       %3852 = OpVariable %_ptr_StorageBuffer__struct_1075 StorageBuffer
       %3853 = OpVariable %_ptr_StorageBuffer__struct_1075 StorageBuffer
       %3854 = OpVariable %_ptr_UniformConstant_410 UniformConstant
       %3407 = OpVariable %_ptr_UniformConstant_422 UniformConstant
       %3956 = OpFunction %void None %1282
      %11154 = OpLabel
      %11804 = OpAccessChain %_ptr_Uniform_uint %3913 %int_0
       %8217 = OpLoad %uint %11804
       %6482 = OpBitwiseAnd %uint %8217 %uint_1048575
       %6904 = OpBitcast %ushort %6482
      %21140 = OpLoad %v3uint %gl_GlobalInvocationID
      %22414 = OpCompositeExtract %uint %21140 1
      %15014 = OpUConvert %uint %6904
       %9822 = OpIMul %uint %22414 %15014
      %12795 = OpCompositeExtract %uint %21140 0
       %7519 = OpIAdd %uint %9822 %12795
      %23297 = OpAccessChain %_ptr_StorageBuffer__struct_1080 %3852 %int_0 %7519
      %15944 = OpLoad %_struct_1080 %23297
       %6981 = OpCompositeExtract %v4float %15944 0
      %12303 = OpIMul %uint %22414 %8217
      %13530 = OpIAdd %uint %12303 %12795
       %9844 = OpAccessChain %_ptr_StorageBuffer__struct_1080 %3852 %int_0 %13530
      %15887 = OpLoad %_struct_1080 %9844
       %6303 = OpCompositeExtract %v4float %15887 1
       %9563 = OpFAdd %v4float %6981 %6303
      %23048 = OpAccessChain %_ptr_StorageBuffer__struct_1080 %3853 %int_0 %13530
      %12173 = OpLoad %_struct_1080 %23048
      %18371 = OpCompositeExtract %v4float %12173 0
      %18609 = OpLoad %_struct_1080 %23048
      %13966 = OpCompositeExtract %v4float %18609 1
      %18759 = OpFAdd %v4float %18371 %13966
      %14948 = OpFAdd %v4float %9563 %18759
      %23129 = OpBitcast %int %13530
       %7766 = OpLoad %410 %3854
       %3401 = OpImageFetch %v4float %7766 %23129
      %22426 = OpFAdd %v4float %14948 %3401
      %21491 = OpVectorShuffle %v2uint %21140 %21140 0 1
       %8921 = OpLoad %422 %3407
               OpImageWrite %8921 %21491 %22426
               OpReturn
               OpFunctionEnd

csyonghe commented 7 months ago

Should the HLSL be:

uint16_t2 v2 = bit_cast<uint16_t2>( nWidth );

?

Your code casting it to uint16_t4 which is not 4 bytes.

danginsburg commented 7 months ago

Oh, yeah, sorry I was just playing around to see what compiled, I hadn't gotten to the point of trying to write usable code with it yet. :) Probably it should generate errors somewhere north of spirv-val though?

csyonghe commented 7 months ago

Agreed. We are missing a diagnostic here.

From: Dan Ginsburg @.> Sent: Thursday, February 15, 2024 1:12 PM To: shader-slang/slang @.> Cc: Yong He @.>; Comment @.> Subject: Re: [shader-slang/slang] [ SPIR-V ] Support HLSL SM6.6 Pack/Unpack intrinics (Issue #3594)

Oh, yeah, sorry I was just playing around to see what compiled, I hadn't gotten to the point of trying to write usable code with it yet. :) Probably it should generate errors somewhere north of spirv-val though?

— Reply to this email directly, view it on GitHubhttps://github.com/shader-slang/slang/issues/3594#issuecomment-1947346950, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUHRBN4XCBS2W5BX7POLX3YTZ24TAVCNFSM6AAAAABDKXXWUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBXGM2DMOJVGA. You are receiving this because you commented.Message ID: @.***>

csyonghe commented 6 months ago

Given that there is already a better alternative for SPIRV, the main work here is to expose the HLSL intrinsics.

@danginsburg I want to drop the priority of this issue a bit in hope that the current solution is sufficient to unblock your progress. Let us know if you want this prioritized.

danginsburg commented 6 months ago

@danginsburg I want to drop the priority of this issue a bit in hope that the current solution is sufficient to unblock your progress. Let us know if you want this prioritized.

So you are saying that bit_cast<> with the direct SPIR-V backend should provide all the functionality of the SM6.6 intrinics, sans catching errors?

csyonghe commented 6 months ago

Unless I am missing anything, the HLSL intrinsic is essentially just providing bit cast without exposing real int8 types. SPIRV already gives us int8 and bit cast, and it feels silly to use that HLSL intrinsics.

csyonghe commented 6 months ago

@jkwak-work Is this something you can plan with Ariel to get done?

jkwak-work commented 6 months ago

@jkwak-work Is this something you can plan with Ariel to get done?

I think Ariel can do it because the goal is very clear and the scope is limited.

Because the milestone is set to Q2, I am gonna assume that we can aim for the completion within two or three months. Please let me know if it is more urgent than I can see.

csyonghe commented 6 months ago

Yes, that's fine. Thank you for taking this!

csyonghe commented 5 months ago

This is done.

shader-slang / slang

Support HLSL SM6.6 Pack/Unpack intrinics #3594