Are SM6.6 calls for pack_clamp_u8 using the wrong input type?

alecazam commented 1 year ago

// These take unsigned.  Seems correct
uint8_t4_packed pack_u8(uint32_t4 unpackedVal);         // Pack lower 8 bits, drop unused bits
uint8_t4_packed pack_u8(uint16_t4 unpackedVal);         // Pack lower 8 bits, drop unused bits

// these take signed ?
uint8_t4_packed pack_clamp_u8(int32_t4  unpackedVal);   // Pack and Clamp [0, 255] <-
uint8_t4_packed pack_clamp_u8(int16_t4  unpackedVal);   // Pack and Clamp [0, 255] <-

Are these instructions defined incorrectly. I would thing uint32_t4 and uint16_t4 would map to uint8_t4. Instead both of these calls take signed int32/int16.

dneto0 commented 10 months ago

Intel has been landing 4x 8bit dot products in the Chromium WebGPU stack. This bug is causing us to avoid using this builtin, and instead generate a polyfill. https://dawn-review.git.corp.google.com/c/dawn/+/166824/8/src/tint/lang/hlsl/writer/ast_printer/ast_printer.cc#262

When this bug is fixed we'll update to use the builtin.

damyanp commented 1 month ago

I don't have access to the linked code. However, this is the correct definition of these HLSL intrinsics and changing this would be a breaking shader model change.

microsoft / DirectXShaderCompiler

Are SM6.6 calls for pack_clamp_u8 using the wrong input type? #5091