[llama] failure to translate to executable

dan-garvey commented 2 weeks ago

What happened?

Failing ir: The IR comes from a single attn layer in prefill, except I made it smaller by just returning the consumer of the source of the error https://gist.github.com/dan-garvey/a62bb744c225ff82cd482ddf469be453

compile command ../iree-build/tools/iree-compile failing_ir.mlir -o out.vmfb --iree-hal-target-device=hip --iree-hip-target=gfx942

f16_half_ndc.mlir:469:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E5M2FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647], max_load_instruction_bits = 128, simds_per_wgp = 4, vgpr_space_bits = 16384>>, ukernels = "none"}>
    %137 = torch.aten.transpose.int %136#0, %int1_209, %int2_210 : !torch.vtensor<[1,32,?,128],f16>, !torch.int, !torch.int -> !torch.vtensor<[1,?,32,128],f16>
           ^

full error message including ir can be found in the gist underneath the source ir.

benvanik commented 2 weeks ago

(please include titles that are readable and error messages/etc)

dan-garvey commented 2 weeks ago

sorry, wanted a link to throw in chat, will clean it up

iree-org / iree

[llama] failure to translate to executable #18925

What happened?