iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 618 forks source link

Add support for TensorCore F32 emulation through TF32 #9394

Open ThomasRaoux opened 2 years ago

ThomasRaoux commented 2 years ago

Cutlass added support for float32 emulation using TF32 tensorcore operations. In MLIR we have representations for mma.sync for TF32. We should differentiate mma.sync for float32 and tf32 and have a lowering pattern from mma.sycn f32 to a code sequence of mma.sync tf32. This would go in nvgpu dialect transformations in MLIR and can then be used for IREE.

Possible break down:

  1. Add an optional attribute in mma.sync op to specify that the op uses TF32 precision
  2. Create a pattern in NVGPU dialect transforms to lower mma.sync f32 without the tf32 attribute to a code sequence of mma.sync with the tf32 attribute.
manishucsd commented 2 years ago

self tag @manishucsd

manishucsd commented 2 years ago

OptionalAttr: Present OR Not Present. Follow bypassL1 which uses the same datatype

Present : Allowed to use TF32 lowering given that the data type for the F32 (update the verifier) Not present : TF32 lowering is not allowed

Precision information comes from the users:

Choice and 2. and 3. are enabled using an enum passed to the pattern rewriter in populateMmaSyncF32ToT32Patterns

manishucsd commented 2 years ago

In progress here: https://reviews.llvm.org/D130294

manishucsd commented 2 years ago

The next steps here are to use the added OptionalAttr tf32Enabled and enum MmaSyncF32Lowering to enable support for TF32x3 a.k.a. F32 emulation through TensorCores.

allieculp commented 1 year ago

@manishucsd Is this still open? Still P1 work?