Open alan-baker opened 1 year ago
Another question: should there be a uint32_t accumulator type? Currently the proposal only lists int32_t.
Apologies for the extremely late message here. Unfortunately, when we initially posted these specifications it was already too late for us to make changes to this design.
SM 6.8 is the last set of features we're releasing that were designed and implemented through the process our team used before the HLSL-specs process. That means we didn't account for a public design review iteration process in the feature delivery timeline.
We will track this issue and consider future improvements to wave cooperative matrix features publicly.
Which proposal does this relate to?
61 - Wave Matrix
Describe the issue or outstanding question. The proposal describes WaveMatrixLeft and WaveMatrixRight as being parameterized using M and N which would be the same values for WaveMatrixAccumulator. The proposal indicates
WaveMatrixLeft<float16_t, 16, 16>
creates a16xWaveMatrixLeft<float16_t, 16, 16>::MatrixDepth()
matrix. It would be good to call that out more explicitly. Since the depth is not known at compile time, should authors always overestimate how much groupshared memory is required for loading and storing? The example in the proposal uses 32 as some safe constant, but would authors be expected to check all hardware they want to run on to set such a value? More specifics on sizing groupshared memory would be helpful.The SPIR-V extension specifies the K dimension in OpTypeCooperativeMatrixKHR (with an appropriate usage) (see SPV_KHR_cooperative_matrix). This creates some friction for translation. The types are can only be cross-compiled by examining which accumulators they are used with. Can you give some context as to why these parameterizations were chosen? Specifically, why K was chosen to be implicit. Real use cases would be translatable with work, but trivial shaders (good for unit testing) would be problematic. For example
WaveMatrixLeft<float16_t, 16, 16>
does not have a direct SPIR-V translation. The SPIR-V could use a spec constant and some shader reflection to do this at pipeline compilation time, but that requires modifications on the API side too. Is there an opportunity to reduce the translation friction here?Additional context Add any other context or screenshots about the feature request here.
CC @jeffbolznv @kpet