#Scenario 1
triton-shared-opt --triton-to-linalg-experimental res.ttir
bin/res.ttir:7:10: error: failed to materialize conversion for result #0 of operation 'tt.reduce' that remained live after conversion
%4 = "tt.reduce"(%3) <{axis = 0 : i32}> ({
^
res.ttir:7:10: note: see current operation:
%8 = "tt.reduce"(%1) <{axis = 0 : i32}> ({
^bb0(%arg17: f16, %arg18: f16):
%25 = "arith.addf"(%arg17, %arg18) <{fastmath = #arith.fastmath<none>}> : (f16, f16) -> f16
"tt.reduce.return"(%25) : (f16) -> ()
}) : (tensor<32xf16>) -> f16
bin/softmax-alt2.ttir:13:10: note: see existing live user here: %8 = arith.extf %4 : f16 to f32
%6 = arith.extf %4 : f16 to f32
^
#Scenario 2
<unknown>:0: error: 'linalg.yield' op type of yield operand 1 ('bf16') doesn't match the element type of the enclosing linalg.generic op ('f16')
<unknown>:0: note: see current operation: "linalg.yield"(%arg11) : (bf16) -> ()
Additional information
Triton-shared branch: nhat/dep ( Specifically this commit )
Mentioning two different errors from two different kernels with a minor difference, but in the same ticket as they are both related to fp16 computation issues.
Triton python code
Triton IR
Crash log
Additional information
Triton-shared branch: nhat/dep ( Specifically this commit )
Mentioning two different errors from two different kernels with a minor difference, but in the same ticket as they are both related to fp16 computation issues.