microsoft / hlsl-specs

HLSL Specifications
MIT License
118 stars 30 forks source link

[SM??] Wave Matrix clarifications #72

Open alan-baker opened 1 year ago

alan-baker commented 1 year ago

Which proposal does this relate to?

61 - Wave Matrix

Describe the issue or outstanding question. The proposal describes WaveMatrixLeft and WaveMatrixRight as being parameterized using M and N which would be the same values for WaveMatrixAccumulator. The proposal indicates WaveMatrixLeft<float16_t, 16, 16> creates a 16xWaveMatrixLeft<float16_t, 16, 16>::MatrixDepth() matrix. It would be good to call that out more explicitly. Since the depth is not known at compile time, should authors always overestimate how much groupshared memory is required for loading and storing? The example in the proposal uses 32 as some safe constant, but would authors be expected to check all hardware they want to run on to set such a value? More specifics on sizing groupshared memory would be helpful.

The SPIR-V extension specifies the K dimension in OpTypeCooperativeMatrixKHR (with an appropriate usage) (see SPV_KHR_cooperative_matrix). This creates some friction for translation. The types are can only be cross-compiled by examining which accumulators they are used with. Can you give some context as to why these parameterizations were chosen? Specifically, why K was chosen to be implicit. Real use cases would be translatable with work, but trivial shaders (good for unit testing) would be problematic. For example WaveMatrixLeft<float16_t, 16, 16> does not have a direct SPIR-V translation. The SPIR-V could use a spec constant and some shader reflection to do this at pipeline compilation time, but that requires modifications on the API side too. Is there an opportunity to reduce the translation friction here?

Additional context Add any other context or screenshots about the feature request here.

CC @jeffbolznv @kpet

alan-baker commented 1 year ago

Another question: should there be a uint32_t accumulator type? Currently the proposal only lists int32_t.

llvm-beanz commented 10 months ago

Apologies for the extremely late message here. Unfortunately, when we initially posted these specifications it was already too late for us to make changes to this design.

SM 6.8 is the last set of features we're releasing that were designed and implemented through the process our team used before the HLSL-specs process. That means we didn't account for a public design review iteration process in the feature delivery timeline.

We will track this issue and consider future improvements to wave cooperative matrix features publicly.