Open hugomano opened 3 days ago
I think I see the problem: the MathToROCDL pass in mlir doesn't specify a lowering for F32, and the default for BF16 ops is that we convert to F32 and use the lowering for F32. This doesn't work in this case. @draganmladjenovic can you maybe take a look at this?
Seems related to https://github.com/llvm/llvm-project/pull/102971 It should be verified whether this patch actually makes sense? I would have thought that if intrinsics exist, then in the end it would also be lowered to them.
https://github.com/llvm/llvm-project/pull/102971 does not have any tests for bf16. I think it can be fixed in mlir upstream using a pattern that uses logic similar to maybeCast
in https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Conversion/GPUCommon/OpToFuncCallLowering.h;rcl=699896658;l=98
The following MLIR code is not able to compile anymore for the ROCm platform (6.2 used here), since this commit: https://github.com/openxla/xla/commit/6e9eefeec077f49c2b22bfeee8da537ed8517b22
Error traceback:
HLO dump:
Bests, Hugo