Open ThomasRaoux opened 2 years ago
self tag @manishucsd
OptionalAttrbypassL1
which uses the same datatype
Present : Allowed to use TF32 lowering given that the data type for the F32 (update the verifier) Not present : TF32 lowering is not allowed
Precision information comes from the users:
Per operations
Global model level. There are three choices for F32 input:
Choice and 2. and 3. are enabled using an enum passed to the pattern rewriter in populateMmaSyncF32ToT32Patterns
In progress here: https://reviews.llvm.org/D130294
The next steps here are to use the added OptionalAttr tf32Enabled
and enum MmaSyncF32Lowering
to enable support for TF32x3
a.k.a. F32 emulation through TensorCores.
@manishucsd Is this still open? Still P1 work?
Cutlass added support for float32 emulation using TF32 tensorcore operations. In MLIR we have representations for mma.sync for TF32. We should differentiate mma.sync for float32 and tf32 and have a lowering pattern from mma.sycn f32 to a code sequence of mma.sync tf32. This would go in nvgpu dialect transformations in MLIR and can then be used for IREE.
Possible break down: