llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.39k stars 12.15k forks source link

[mlir][spirv] Migrate mlir-vulkan-runner to follow other client API runners #73457

Open antiagainst opened 1 year ago

antiagainst commented 1 year ago

We added mlir-vulkan-runner in way early days of MLIR. Recently various MLIR client API runners (e.g., mlir-cuda-runner) were removed in favor of performing translation using mlir-opt and then leverage mlir-cpu-runner as the host coordnation mechanism. See @joker-eph's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1710872236 for more context. We should migrate mlir-vulkan-runner to follow there. This would unify the runner story in MLIR to have one single mlir-runner, as @Jianhui-Li's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1712414848 here.

llvmbot commented 1 year ago

@llvm/issue-subscribers-mlir-spirv

Author: Lei Zhang (antiagainst)

We added mlir-vulkan-runner in way early days of MLIR. Recently various MLIR client API runners (e.g., mlir-cuda-runner) were removed in favor of performing translation using `mlir-opt` and then leverage `mlir-cpu-runner` as the host coordnation mechanism. See @joker-eph's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1710872236 for more context. We should migrate mlir-vulkan-runner to follow there. This would unify the runner story in MLIR to have one single mlir-runner, as @Jianhui-Li's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1712414848 here.
llvmbot commented 1 year ago

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

1) In the comments of the issue, request for it to be assigned to you. 2) Fix the issue locally. 3) Run the test suite locally. 3.1) Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests. 4) Create a Git commit. 5) Run git clang-format HEAD~1 to format your changes. 6) Open a pull request to the upstream repository on GitHub. 6.1) Detailed instructions can be found here.

If you have any further questions about this issue, don't hesitate to ask via a comment on this Github issue.

llvmbot commented 1 year ago

@llvm/issue-subscribers-good-first-issue

Author: Lei Zhang (antiagainst)

We added mlir-vulkan-runner in way early days of MLIR. Recently various MLIR client API runners (e.g., mlir-cuda-runner) were removed in favor of performing translation using `mlir-opt` and then leverage `mlir-cpu-runner` as the host coordnation mechanism. See @joker-eph's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1710872236 for more context. We should migrate mlir-vulkan-runner to follow there. This would unify the runner story in MLIR to have one single mlir-runner, as @Jianhui-Li's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1712414848 here.
bhaskar1001101 commented 1 year ago

Hi. I would like to work on this. I'll try to take ⚙ D98396 [mlir] Remove mlir-cuda-runner as reference.

antiagainst commented 11 months ago

Hi @bhaskar1001101, sorry I missed your reply previously. Are you still interested to push this forward? If so I'll assign you to the issue. :)

Sh0g0-1758 commented 10 months ago

Hello, I am new to LLVM and would like to work on this. @antiagainst , can you please assign me this issue.

rengolin commented 10 months ago

@bhaskar1001101 and @Sh0g0-1758, you both have shown interest, so I assigned both of you. Can you work together on this?

To be clear, the idea is to remove mlir-vulkan-runner, moving the logic inside mlir-cpu-runner (like CUDA did) and then renaming mlir-cpu-runner to just mlir-runner.

antiagainst commented 10 months ago

Yeah. Note that I've marked this as good first issue but it's a relative large effort than normal, and may need some reading and understanding of mlir runners and vulkan specficially. Please let me know if you have questions. There are also other smaller good first issues if you are interested, just search with label "mlir:spirv" and "good first issues" to find them.

Sh0g0-1758 commented 10 months ago

yes sure thing @antiagainst . I was getting familiar with mlir and will update you when a question of which I can't answer on the discourse arise.

antiagainst commented 6 months ago

Hey @bhaskar1001101 and @Sh0g0-1758, is this something you are still interested? Have you able to make progress on it?

Rajveer100 commented 6 months ago

@antiagainst Any particular insights that you would like to give apart from the comment links in the issue description?

tw-ilson commented 5 months ago

I'll take a look at this as well.

kuhar commented 2 months ago

Rough breakdown of the migration steps:

mlir-vulkan-runner

mlir-spirv-cpu-runner

It might be easier to start with the spirv cpu runner (the conversion pipeline is much simpler) and then move to the vulkan runner.

cc: @andfau-amd

llvmbot commented 2 months ago

@llvm/issue-subscribers-infrastructure

Author: Lei Zhang (antiagainst)

We added mlir-vulkan-runner in way early days of MLIR. Recently various MLIR client API runners (e.g., mlir-cuda-runner) were removed in favor of performing translation using `mlir-opt` and then leverage `mlir-cpu-runner` as the host coordnation mechanism. See @joker-eph's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1710872236 for more context. We should migrate mlir-vulkan-runner to follow there. This would unify the runner story in MLIR to have one single mlir-runner, as @Jianhui-Li's https://github.com/llvm/llvm-project/pull/65539#issuecomment-1712414848 here.
andfau-amd commented 2 months ago

@kuhar suggested that I dump the IR used by the runners already integrated with the CPU runner, so we can see how they pass along the binary.

I took https://github.com/llvm/llvm-project/blob/111932d5cae0199d9c59669b37232a011f8b8757/mlir/test/Integration/GPU/ROCM/printf.mlir and extracted a command line (gfx90a chip manually inserted):

../llvm-build/bin/mlir-opt mlir/test/Integration/GPU/ROCM/printf.mlir | \
../llvm-build/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl{index-bitwidth=32 runtime=HIP}),rocdl-attach-target{chip=gfx90a})' | \
../llvm-build/bin/mlir-opt -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary

With the AMDGPU target built, the ROCm bitcode (rocm-device-libs) installed, and LLVM LLD available (either ROCm's version or, in my case, the ROCDL MLIR target hacked to call LLVM's ld.lld), this can give you:

module attributes {gpu.container_module} {
  gpu.binary @kernels  [#gpu.object<#rocdl.target<chip = "gfx90a">, kernels = <[#gpu.kernel_metadata<"hello", !llvm.func<void ()>, metadata = {agpr_count = 0 : i64, group_segment_fixed_size = 0 : i64, max_flat_workgroup_size = 256 : i64, private_segment_fixed_size = 0 : i64, reqd_workgroup_size = array<i32: -1, -1, -1>, sgpr_count = 20 : i64, sgpr_spill_count = 0 : i64, vgpr_count = 20 : i64, vgpr_spill_count = 0 : i64, wavefront_size = 64 : i64, workgroup_size_hint = array<i32: -1, -1, -1>}>]>, bin = "\7FELF\02\01\01@\03\00\00\00\00\00\00\00\03\00\E0\00\01\00\00\00\00\00\00\00\00\00\00\00@\00\00\00\00\00\00\00\08\1D\00\00\00\00\00\00?\05\00\00@\008\00\08\00@\00\0E\00\0C\00\06\00\00\00\04\00\00\00@\00\00\00\00\00\00\00@\00\00\00\00\00\00\00@\00\00\00\00\00\00\00\C0\01\00\00\00\00\00\00\C0\01\00\00\00\00\00\00\08\00\00\00\00\00\00\00\01\00\00\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00@\07\00\00\00\00\00\00@\07\00\00\00\00\00\00\00\10\00\00\00\00\00\00\01\00\00\00\05\00\00\00\00\08\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00\18\00\00\00\00\00\00@\13\00\00\00\00\00\00@\13\00\00\00\00\00\00\00\10\00\00\00\00\00\00\01\00\00\00\06\00\00\00@\1B\00\00\00\00\00\00@;\00\00\00\00\00\00@;\00\00\00\00\00\00p\00\00\00\00\00\00\00\C0\04\00\00\00\00\00\00\00\10\00\00\00\00\00\00\02\00\00\00\06\00\00\00@\1B\00\00\00\00\00\00@;\00\00\00\00\00\00@;\00\00\00\00\00\00p\00\00\00\00\00\00\00p\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00R\E5td\04\00\00\00@\1B\00\00\00\00\00\00@;\00\00\00\00\00\00@;\00\00\00\00\00\00p\00\00\00\00\00\00\00\C0\04\00\00\00\00\00\00\01\00\00\00\00\00\00\00Q\E5td\06\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\04\00\00\00\04\00\00\00\00\02\00\00\00\00\00\00\00\02\00\00\00\00\00\00\00\02\00\00\00\00\00\00P\04\00\00\00\00\00\00P\04\00\00\00\00\00\00\04\00\00\00\00\00\00\00\07\00\00\00;\04\00\00 \00\00\00AMDGPU\00\00\83\AEamdhsa.kernels\91\DE\00\10\AB.agpr_count\00\A5.args\9E\83\A7.offset\00\A5.size\04\AB.value_kind\B4hidden_block_count_x\83\A7.offset\04\A5.size\04\AB.value_kind\B4hidden_block_count_y\83\A7.offset\08\A5.size\04\AB.value_kind\B4hidden_block_count_z\83\A7.offset\0C\A5.size\02\AB.value_kind\B3hidden_group_size_x\83\A7.offset\0E\A5.size\02\AB.value_kind\B3hidden_group_size_y\83\A7.offset\10\A5.size\02\AB.value_kind\B3hidden_group_size_z\83\A7.offset\12\A5.size\02\AB.value_kind\B2hidden_remainder_x\83\A7.offset\14\A5.size\02\AB.value_kind\B2hidden_remainder_y\83\A7.offset\16\A5.size\02\AB.value_kind\B2hidden_remainder_z\83\A7.offset(\A5.size\08\AB.value_kind\B6hidden_global_offset_x\83\A7.offset0\A5.size\08\AB.value_kind\B6hidden_global_offset_y\83\A7.offset8\A5.size\08\AB.value_kind\B6hidden_global_offset_z\83\A7.offset@\A5.size\02\AB.value_kind\B0hidden_grid_dims\83\A7.offsetP\A5.size\08\AB.value_kind\B6hidden_hostcall_buffer\B9.group_segment_fixed_size\00\B6.kernarg_segment_align\08\B5.kernarg_segment_size\CD\01\00\B8.max_flat_workgroup_size\CD\01\00\A5.name\A5hello\BB.private_segment_fixed_size\00\AB.sgpr_count\14\B1.sgpr_spill_count\00\A7.symbol\A8hello.kd\B8.uniform_work_group_size\01\B3.uses_dynamic_stack\C2\AB.vgpr_count\14\B1.vgpr_spill_count\00\AF.wavefront_size@\ADamdhsa.target\B9amdgcn-amd-amdhsa--gfx90a\AEamdhsa.version\92\01\02\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\12\03\07\00\00\18\00\00\00\00\00\00\08\0F\00\00\00\00\00\00\07\00\00\00\11\00\06\00\00\07\00\00\00\00\00\00@\00\00\00\00\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\1A\00\00\00\08\00@\02\00\00@\00\01\00\00\00\980\92\0F\D7\E7\F8\D8\03\00\00\00\03\00\00\00\02\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00hello\00hello.kd\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\11\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\04\00\00\00\82\00\AF\00\8C\00\00\00\09\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\03\04~\82\00\06\C0P\00\00\00\00\00\8C\D2\C1\00\01\00\0E\00\8D\D2\C1\00\02\00\0E\03\10~\08\05\00~\00\00\CA\D0\00\10\02\00\00@\B3\D3\80\00\01\18\00 \84\BEI\00\88\BF\80\02\06~\7F\C0\8C\BF\18\80U\DC\03\00\02\06p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00(\80T\DC\03\00\02\00\00\80T\DC\03\00\02\04q\0F\8C\BF\00\0D\00&\01\0F\02&\01\00\85\D2\011\01\00\09\00\86\D2\001\01\00\00\00\85\D2\001\01\00\09\03\02hp\0F\8C\BF\04\01\002\05\03\028\00\80U\DC\00\00\7F\04p\0F\8C\BF\18\80\85\DD\03\04\02\00p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\00\0D\DA}j \86\BE!\00\88\BF\80\01\88\BE\01\00\8E\BF(\80T\DC\03\00\02\04\00\80T\DC\03\00\02\0A\06P\B3\D3\00\01\02\18q\0F\8C\BF\04\0D\00&p\0F\8C\BF\00\0A\E8\D1\001)\04\05\0F\0A&\01\03\08~\04\0A\E8\D1\051\11\04\04\03\02~\00\80U\DC\00\00\7F\04p\0F\8C\BF\18\80\85\DD\03\04\02\00p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\00\0D\D4}j\08\88\87~\08\FE\89\E1\FF\89\BF~\08\FE\87~\06\FE\87~\04\FE\87\80\02\12~\7F\C0\8C\BF(\80T\DC\09\00\02\0A\00\80\\\DC\09\00\02\04\00\05\08~\01\05\0A~~\01\86\BEq\0F\8C\BF\0A\05\10~\0B\05\12~\04\08\88\86\09\98\0D\92\08\98\0E\96\08\98\0C\92\00 \8A\BE\0B\00\88\BF\0E\0D\0F\81\0F\02\02~p\0F\8C\BF\0C\08\002\05\03\028\0AP\B3\D3\06\0C\00\18\82\02\18~\81\02\1A~\08\80|\DC\00\0A\7F\00~\0A\FE\87\08\8C\86\8E\07\02\00~p\0F\8C\BF\06\0C\062\07\01\0C8\00\00\8F\D2\86\10\02\00\03\01\002\80\00\88\BE\06\03\028\A1\02\10~\09\03\14~\09\03\16~\08\00\89\BE\00\80|\DC\00\08\7F\00\08\00\8A\BE\08\00\8B\BE\06P\B3\D3\08\10\00\18\08P\B3\D3\0A\14\00\18\10\80|\DC\00\06\7F\00 \80|\DC\00\06\7F\000\80|\DC\00\06\7F\00\00 \86\BEV\00\88\BF\80\02\06~ \80U\DC\03\00\02\12(\80T\DC\03\00\02\06\04\02 ~\05\02\22~p\0F\8C\BF\04\0C\0C&\05\0E\0E&\07\00\85\D2\071\01\00\08\00\86\D2\061\01\00\06\00\85\D2\061\01\00\08\0F\0Eh\04\0D\142\05\0F\168\00\80t\DC\0A\12\7F\00\00\00\A0\E0\00\00\00\00p\0F\8C\BF \80\85\DD\03\10\02\08p\0F\8C\BF\08%\DA}j \88\BE\12\00\88\BF\80\01\8A\BE\01\00\8E\BF\00\80t\DC\0A\08\7F\00\04\02\0C~\05\02\0E~\00\00\A0\E0\00\00\00\00p\0F\8C\BF \80\85\DD\03\06\02\06p\0F\8C\BF\06\11\D4}j\0A\8A\87\08P\B3\D3\06\0D\02\18~\0A\FE\89\EF\FF\89\BF~\08\FE\87\80\02\12~\10\80T\DC\09\00\02\06~\01\88\BE\03\00\8C\D2\08\00\01\00\03\00\8D\D2\09\06\02\00\80\06\94}j \8A\BE\07\00\88\BF\08\0D\88\BE\08\02\10~\00\00\A0\E0\00\00\00\00p\0F\8C\BF\08\80\88\DD\06\08\7F\00~\0A\FE\87p\0F\8C\BF\10\80T\DC\06\00\7F\08p\0F\8C\BF\80\10\D4}\0C\00\87\BF\18\80P\DC\06\00\7F\06\80\02\0E~p\0F\8C\BF\06\05\10~\08\FF|\86\FF\00\00\00\00\00\A0\E0\00\00\00\00\00\80t\DC\08\06\7F\00\01\00\90\BF~\06\FE\87\0E\0D\06\81\06\02\06~\0C\08\082\05\07\068\94\08\082\80\06\0A8\08\00\82\BF~\06\FE\87\03\05\0C~\06\80\06\BF\03\00\85\BF\01\00\8E\BF\02\00\89\BF\0D\00\82\BF\0C\00\82\BF\81\02\06~\00 \86\BE\F5\FF\88\BF\00\80Q\DC\04\00\7F\03p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\81\06\06&\EC\FF\82\BF\00\80T\DC\00\00\7F\04\00 \86\BE9\00\88\BF\80\02\06~(\80T\DC\03\00\02\00\18\80U\DC\03\00\02\0A\00\80T\DC\03\00\02\0C\05\02\0E~\80\01\80\BEr\0F\8C\BF\81\00\122\80\02\1E8\04\12\0C2\0F\0F\0E8\80\0C\D4}\07\1F\0E\00\06\13\0C\00\07\03\02&\06\01\00&\01\00\85\D2\011\01\00\09\00\86\D2\001\01\00\00\00\85\D2\001\01\00\09\03\02hp\0F\8C\BF\0C\01\002\0D\03\028\0A\03\10~\00\80t\DC\00\0A\7F\00\0B\03\12~\00\00\A0\E0\00\00\00\00p\0F\8C\BF\18\80\85\DD\03\06\02\08p\0F\8C\BF\08\15\DA}~j\FE\86\0F\00\88\BF\01\00\8E\BF\00\80t\DC\00\08\7F\00\00\00\A0\E0\00\00\00\00p\0F\8C\BF\18\80\85\DD\03\06\02\0Ap\0F\8C\BF\0A\11\D4}j\00\80\87\08P\B3\D3\0A\15\02\18~\00\FE\89\F1\FF\89\BF~\06\FE\87\0E\03\18~\0C\05\00~\00\00\CA\D0\00\18\02\00\00@\B3\D3\80\00\01\18\00 \84\BEH\00\88\BF\80\02\06~\18\80U\DC\03\00\02\08p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00(\80T\DC\03\00\02\00\00\80T\DC\03\00\02\06q\0F\8C\BF\00\11\00&\01\13\02&\01\00\85\D2\011\01\00\0A\00\86\D2\001\01\00\00\00\85\D2\001\01\00\0A\03\02hp\0F\8C\BF\06\01\002\07\03\028\00\80U\DC\00\00\7F\06p\0F\8C\BF\18\80\85\DD\03\06\02\00p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\00\11\DA}j \86\BE!\00\88\BF\80\01\88\BE\01\00\8E\BF(\80T\DC\03\00\02\06\00\80T\DC\03\00\02\0A\08P\B3\D3\00\01\02\18q\0F\8C\BF\06\11\00&p\0F\8C\BF\00\0A\E8\D1\001)\04\07\13\0E&\01\03\0C~\06\0A\E8\D1\071\19\04\06\03\02~\00\80U\DC\00\00\7F\06p\0F\8C\BF\18\80\85\DD\03\06\02\00p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\00\11\D4}j\08\88\87~\08\FE\89\E1\FF\89\BF~\08\FE\87~\06\FE\87~\04\FE\87\80\02\1A~(\80T\DC\0D\00\02\06\00\80\\\DC\0D\00\02\08\00\05\08~\01\05\0A~~\01\86\BEq\0F\8C\BF\06\05\10~\07\05\12~\04\08\88\86\09\98\0D\92\08\98\0E\96\08\98\0C\92\00 \8A\BE\0B\00\88\BF\0E\0D\0F\81\0F\02\02~p\0F\8C\BF\0C\10\002\09\03\028\10P\B3\D3\06\0C\00\18\82\02$~\81\02&~\08\80|\DC\00\10\7F\00~\0A\FE\87\08\8C\86\8E\07\02\00~p\0F\8C\BF\06\14\062\0B\01\0C8\00\00\8F\D2\86\18\02\00\1F\FF\06\B0\03\01\002\80\00\88\BE\04\00\01\D2\04\0D\00\03\06\03\028\FF\02\0C~Hell\FF\02\0E~o fr\08\00\89\BE\00\80|\DC\00\04\7F\00\08\00\8A\BE\08\00\8B\BE\04P\B3\D3\08\10\00\18\FF\02\14~om %\FF\02\16~d\0A\00\00\0D\03\18~\06P\B3\D3\0A\14\00\18\10\80|\DC\00\0A\7F\00 \80|\DC\00\04\7F\000\80|\DC\00\04\7F\00\00 \86\BEU\00\88\BF\80\02\06~ \80U\DC\03\00\02\12(\80T\DC\03\00\02\04\04\02 ~\05\02\22~p\0F\8C\BF\04\05\10~\05\05\12~\08\04\88\86\09\98\09\92\08\98\0A\96\08\98\08\92\0A\09\09\81\09\02\08~\08\10\142\09\09\168\00\80t\DC\0A\12\7F\00\00\00\A0\E0\00\00\00\00p\0F\8C\BF \80\85\DD\03\10\02\06p\0F\8C\BF\06%\DA}j \88\BE\12\00\88\BF\80\01\8A\BE\01\00\8E\BF\00\80t\DC\0A\06\7F\00\04\02\08~\05\02\0A~\00\00\A0\E0\00\00\00\00p\0F\8C\BF \80\85\DD\03\04\02\04p\0F\8C\BF\04\0D\D4}j\0A\8A\87\06P\B3\D3\04\09\02\18~\0A\FE\89\EF\FF\89\BF~\08\FE\87\80\02\0E~\10\80T\DC\07\00\02\04~\01\88\BE\03\00\8C\D2\08\00\01\00\03\00\8D\D2\09\06\02\00\80\06\94}j \8A\BE\07\00\88\BF\08\0D\88\BE\08\02\0C~\00\00\A0\E0\00\00\00\00p\0F\8C\BF\08\80\88\DD\04\06\7F\00~\0A\FE\87p\0F\8C\BF\10\80T\DC\04\00\7F\06p\0F\8C\BF\80\0C\D4}\0C\00\87\BF\18\80P\DC\04\00\7F\04\80\02\0A~p\0F\8C\BF\04\05\10~\08\FF|\86\FF\00\00\00\00\00\A0\E0\00\00\00\00\00\80t\DC\06\04\7F\00\01\00\90\BF~\06\FE\87\0E\0D\06\81\06\02\06~\0C\10\082\09\07\068\94\08\082\80\06\0A8\08\00\82\BF~\06\FE\87\03\05\0C~\06\80\06\BF\03\00\85\BF\01\00\8E\BF\02\00\89\BF\0D\00\82\BF\0C\00\82\BF\81\02\06~\00 \86\BE\F5\FF\88\BF\00\80Q\DC\04\00\7F\03p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\81\06\06&\EC\FF\82\BF\00\80T\DC\00\00\7F\00\00 \86\BE9\00\88\BF\80\02\06~(\80T\DC\03\00\02\08\18\80U\DC\03\00\02\0A\00\80T\DC\03\00\02\0C\05\02\0A~\80\01\80\BEr\0F\8C\BF\81\10\0E2\80\12\1E8\04\0E\082\0F\0B\0A8\80\08\D4}\05\1F\0A\00\04\0F\08\00\05\13\0E&\04\11\10&\07\00\85\D2\071\01\00\09\00\86\D2\081\01\00\08\00\85\D2\081\01\00\09\0F\0Ehp\0F\8C\BF\0C\11\102\0D\0F\128\0A\03\0C~\00\80t\DC\08\0A\7F\00\0B\03\0E~\00\00\A0\E0\00\00\00\00p\0F\8C\BF\18\80\85\DD\03\04\02\06p\0F\8C\BF\06\15\DA}~j\FE\86\0F\00\88\BF\01\00\8E\BF\00\80t\DC\08\06\7F\00\00\00\A0\E0\00\00\00\00p\0F\8C\BF\18\80\85\DD\03\04\02\0Ap\0F\8C\BF\0A\0D\D4}j\00\80\87\06P\B3\D3\0A\15\02\18~\00\FE\89\F1\FF\89\BF~\06\FE\87\0E\05\00~\00\00\CA\D0\00\1C\02\00\08@\B3\D3\80\00\01\18\00 \84\BEH\00\88\BF\80\02\06~\18\80U\DC\03\00\02\06p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00(\80T\DC\03\00\02\04\00\80T\DC\03\00\02\08q\0F\8C\BF\04\0D\08&\05\0F\0A&\05\00\85\D2\051\01\00\0A\00\86\D2\041\01\00\04\00\85\D2\041\01\00\0A\0B\0Ahp\0F\8C\BF\08\09\082\09\0B\0A8\00\80U\DC\04\00\7F\04p\0F\8C\BF\18\80\85\DD\03\04\02\08p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\08\0D\DA}j \86\BE!\00\88\BF\80\01\88\BE\01\00\8E\BF(\80T\DC\03\00\02\04\00\80T\DC\03\00\02\0A\06P\B3\D3\08\11\02\18q\0F\8C\BF\04\0D\08&\05\0F\12&p\0F\8C\BF\04\0A\E8\D1\041)\04\05\03\10~\08\0A\E8\D1\091!\04\08\03\0A~\00\80U\DC\04\00\7F\04p\0F\8C\BF\18\80\85\DD\03\04\02\08p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\08\0D\D4}j\08\88\87~\08\FE\89\E1\FF\89\BF~\08\FE\87~\06\FE\87~\04\FE\87\80\02\1E~(\80T\DC\0F\00\02\0A\00\80\\\DC\0F\00\02\04\08\05\08~\09\05\0A~~\01\86\BEq\0F\8C\BF\0A\05\10~\0B\05\12~\04\08\88\86\09\98\0D\92\08\98\0E\96\08\98\0C\92\00 \8A\BE\0B\00\88\BF\0E\0D\0F\81\0F\02\06~p\0F\8C\BF\0C\08\182\05\07\1A8\08P\B3\D3\06\0C\00\18\82\02\14~\81\02\16~\08\80|\DC\0C\08\7F\00~\0A\FE\87\08\8C\86\8E\07\02\06~p\0F\8C\BF\06\0C\102\07\07\068\06\00\8F\D2\86\1C\02\00\80\00\88\BE\1D\FF\06\B0\08\0D\0C2\00\00\01\D2\00\0D\88\02\03\0F\0E8\0F\03\06~\08\00\89\BE\00\80|\DC\06\00\7F\00\08\00\8A\BE\08\00\8B\BE\00P\B3\D3\08\10\00\18\02P\B3\D3\0A\14\00\18\10\80|\DC\06\00\7F\00 \80|\DC\06\00\7F\000\80|\DC\06\00\7F\00\00 \86\BEU\00\88\BF\80\02\10~ \80U\DC\08\00\02\0C(\80T\DC\08\00\02\00\04\02\14~\05\02\16~p\0F\8C\BF\00\05\10~\01\05\12~\08\04\88\86\09\98\09\92\08\98\0A\96\08\98\08\92\0A\09\09\81\09\02\00~\08\08\0C2\05\01\0E8\00\80t\DC\06\0C\7F\00\00\00\A0\E0\00\00\00\00p\0F\8C\BF \80\85\DD\08\0A\02\02p\0F\8C\BF\02\19\DA}j \88\BE\12\00\88\BF\80\01\8A\BE\01\00\8E\BF\00\80t\DC\06\02\7F\00\04\02\00~\05\02\02~\00\00\A0\E0\00\00\00\00p\0F\8C\BF \80\85\DD\08\00\02\00p\0F\8C\BF\00\05\D4}j\0A\8A\87\02P\B3\D3\00\01\02\18~\0A\FE\89\EF\FF\89\BF~\08\FE\87\80\02\06~\10\80T\DC\03\00\02\00~\01\88\BE\02\00\8C\D2\08\00\01\00\02\00\8D\D2\09\04\02\00\80\04\94}j \8A\BE\07\00\88\BF\08\0D\88\BE\08\02\04~\00\00\A0\E0\00\00\00\00p\0F\8C\BF\08\80\88\DD\00\02\7F\00~\0A\FE\87p\0F\8C\BF\10\80T\DC\00\00\7F\02p\0F\8C\BF\80\04\D4}\0C\00\87\BF\18\80P\DC\00\00\7F\00\80\02\02~p\0F\8C\BF\00\05\10~\08\FF|\86\FF\00\00\00\00\00\A0\E0\00\00\00\00\00\80t\DC\02\00\7F\00\01\00\90\BF~\06\FE\87\0E\0D\06\81\06\02\00~\0C\08\022\05\01\048\94\02\002\80\04\028\08\00\82\BF~\06\FE\87\02\05\0C~\06\80\06\BF\03\00\85\BF\01\00\8E\BF\02\00\89\BF\0D\00\82\BF\0C\00\82\BF\81\02\04~\00 \86\BE\F5\FF\88\BF\00\80Q\DC\00\00\7F\02p\0F\8C\BF\00\00\A4\E0\00\00\00\00\00\00\FC\E0\00\00\00\00\81\04\04&\EC\FF\82\BF\00 \86\BE9\00\88\BF\80\02\0C~(\80T\DC\06\00\02\04\18\80U\DC\06\00\02\08\00\80T\DC\06\00\02\0A\05\02\02~\80\01\80\BEr\0F\8C\BF\81\08\062\80\0A\0E8\04\06\002\07\03\028\80\00\D4}\01\0F\02\00\00\07\00\00\01\0B\06&\00\09\08&\03\00\85\D2\031\01\00\05\00\86\D2\041\01\00\04\00\85\D2\041\01\00\05\07\06hp\0F\8C\BF\0A\09\082\0B\07\0A8\08\03\04~\00\80t\DC\04\08\7F\00\09\03\06~\00\00\A0\E0\00\00\00\00p\0F\8C\BF\18\80\85\DD\06\00\02\02p\0F\8C\BF\02\11\DA}~j\FE\86\0F\00\88\BF\01\00\8E\BF\00\80t\DC\04\02\7F\00\00\00\A0\E0\00\00\00\00p\0F\8C\BF\18\80\85\DD\06\00\02\08p\0F\8C\BF\08\05\D4}j\00\80\87\02P\B3\D3\08\11\02\18~\00\FE\89\F1\FF\89\BF\00\00\81\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\00\00\80\BF\06\00\00\00\00\00\00\00P\06\00\00\00\00\00\00\0B\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00\05\00\00\00\00\00\00\00\DC\06\00\00\00\00\00\00\0A\00\00\00\00\00\00\00\10\00\00\00\00\00\00\00\F5\FE\FFo\00\00\00\00\98\06\00\00\00\00\00\00\04\00\00\00\00\00\00\00\BC\06\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00Linker: LLD 20.0.0 (https://github.com/llvm/llvm-project.git e13cbaca6925629165e3cced90b33777f0fe09fe)\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\10\00\00\00\00\02\08\00@;\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\12\03\07\00\00\18\00\00\00\00\00\00\08\0F\00\00\00\00\00\00\07\00\00\00\11\00\06\00\00\07\00\00\00\00\00\00@\00\00\00\00\00\00\00\00.note\00.dynsym\00.gnu.hash\00.hash\00.dynstr\00.rodata\00.text\00.dynamic\00.relro_padding\00.comment\00.symtab\00.shstrtab\00.strtab\00\00hello\00hello.kd\00_DYNAMIC\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\07\00\00\00\02\00\00\00\00\00\00\00\00\02\00\00\00\00\00\00\00\02\00\00\00\00\00\00P\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\07\00\00\00\0B\00\00\00\02\00\00\00\00\00\00\00P\06\00\00\00\00\00\00P\06\00\00\00\00\00\00H\00\00\00\00\00\00\00\05\00\00\00\01\00\00\00\08\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00\0F\00\00\00\F6\FF\FFo\02\00\00\00\00\00\00\00\98\06\00\00\00\00\00\00\98\06\00\00\00\00\00\00$\00\00\00\00\00\00\00\02\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\19\00\00\00\05\00\00\00\02\00\00\00\00\00\00\00\BC\06\00\00\00\00\00\00\BC\06\00\00\00\00\00\00 \00\00\00\00\00\00\00\02\00\00\00\00\00\00\00\04\00\00\00\00\00\00\00\04\00\00\00\00\00\00\00\1F\00\00\00\03\00\00\00\02\00\00\00\00\00\00\00\DC\06\00\00\00\00\00\00\DC\06\00\00\00\00\00\00\10\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00'\00\00\00\01\00\00\00\02\00\00\00\00\00\00\00\00\07\00\00\00\00\00\00\00\07\00\00\00\00\00\00@\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00@\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00/\00\00\00\01\00\00\00\06\00\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00\08\00\00\00\00\00\00@\13\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\005\00\00\00\06\00\00\00\03\00\00\00\00\00\00\00@;\00\00\00\00\00\00@\1B\00\00\00\00\00\00p\00\00\00\00\00\00\00\05\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00\10\00\00\00\00\00\00\00>\00\00\00\08\00\00\00\03\00\00\00\00\00\00\00\B0;\00\00\00\00\00\00\B0\1B\00\00\00\00\00\00P\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00M\00\00\00\01\00\00\000\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\B0\1B\00\00\00\00\00\00g\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00V\00\00\00\02\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\18\1C\00\00\00\00\00\00`\00\00\00\00\00\00\00\0D\00\00\00\02\00\00\00\08\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00^\00\00\00\03\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00x\1C\00\00\00\00\00\00p\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00h\00\00\00\03\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\E8\1C\00\00\00\00\00\00\19\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00">]
  llvm.func @main() {
    %0 = llvm.mlir.constant(2 : index) : i64
    %1 = llvm.mlir.constant(1 : index) : i64
    gpu.launch_func  @kernels::@hello blocks in (%1, %1, %1) threads in (%0, %1, %1) : i64
    llvm.return
  }
}
andfau-amd commented 2 months ago

I have now done the same thing for CUDA. I took https://github.com/llvm/llvm-project/blob/111932d5cae0199d9c59669b37232a011f8b8757/mlir/test/Integration/GPU/CUDA/printf.mlir and hacked together a working command line (using examples from documentation and based on what my ptxas version seemed to support):

../llvm-build/bin/mlir-opt mlir/test/Integration/GPU/CUDA/printf.mlir | \
../llvm-build/bin/mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_87 cubin-features=+ptx75 opt-level=3"

With the NVPTX target built and ptxas installed (from nvidia-cuda-toolkit), this can give you:

module attributes {gpu.container_module} {
  gpu.binary @kernels  [#gpu.object<#nvvm.target<O = 3, chip = "sm_87", features = "+ptx75">, "P\EDU\BA\01\00\10\00\B0\10\00\00\00\00\00\00\02\00\01\01@\00\00\00\A0\0D\00\00\00\00\00\00\00\00\00\00\00\00\00\00\07\00\01\00W\00\00\00\00\00\00\00\00\00\00\00\11\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\7FELF\02\01\013\07\00\00\00\00\00\00\00\02\00\BE\00s\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00 \0A\00\00\00\00\00\00W\05W\00@\00\00\00\00\00@\00\0E\00\01\00\00.shstrtab\00.strtab\00.symtab\00.symtab_shndx\00.nv.info\00.text.hello\00.nv.info.hello\00.nv.shared.hello\00.nv.global.init\00.rel.text.hello\00.rela.text.hello\00.nv.constant0.hello\00.debug_frame\00.rel.debug_frame\00.rela.debug_frame\00.nv.rel.action\00\00.shstrtab\00.strtab\00.symtab\00.symtab_shndx\00.nv.info\00hello\00.text.hello\00.nv.info.hello\00.nv.shared.hello\00.nv.global.init\00printfFormat_0\00vprintf\00.rel.text.hello\00.rela.text.hello\00.nv.constant0.hello\00_SREG\00.debug_frame\00.rel.debug_frame\00.rela.debug_frame\00.nv.rel.action\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\008\00\00\00\03\00\0C\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00d\00\00\00\03\00\0D\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00t\00\00\00\01\00\0D\00\00\00\00\00\00\00\00\00\19\00\00\00\00\00\00\00\AC\00\00\00\03\00\0B\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\C6\00\00\00\03\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\F6\00\00\00\03\00\07\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\002\00\00\00\12\10\0C\00\00\00\00\00\00\00\00\00\80\03\00\00\00\00\00\00\83\00\00\00\12\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\FF\FF\FF\FF(\00\00\00\00\00\00\00\FF\FF\FF\FF\FF\FF\FF\FF\03\00\04|\FF\FF\FF\FF\0F\0C\81\80\80(\00\08\FF\81\80(\08\81\80\80(\00\00\00\00\00\00\00\FF\FF\FF\FF0\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00p\03\00\00\00\00\00\00\04\08\00\00\00\04H\00\00\00\0C\81\80\80(\18\04L\00\00\00\00\00\00\04/\08\00\07\00\00\00\18\00\00\00\04#\08\00\07\00\00\00\00\00\00\00\04\12\08\00\07\00\00\00\18\00\00\00\04\11\08\00\07\00\00\00\18\00\00\00\046\04\00\02\00\00\00\047\04\00s\00\00\00\015\00\00\03\1B\FF\00\04\0F\04\00\08\00\00\00\04\1C\04\00p\02\00\00K\00\00\00\00\00\00\00\00\02\02\08\10\0A/\22\00\00\00\08\00\00\00\00\00\00\08\08\00\00\00\00\00\00\10\08\00\00\00\00\00\00\18\08\00\00\00\00\00\00 \08\00\00\00\00\00\00(\08\00\00\00\00\00\000\08\00\00\00\00\00\008\08\00\00\00\00\01\00\00\08\00\00\00\00\01\00\08\08\00\00\00\00\01\00\10\08\00\00\00\00\01\00\18\08\00\00\00\00\01\00 \08\00\00\00\00\01\00(\08\00\00\00\00\01\000\08\00\00\00\00\01\008\08\00\00\00\00\02\00\00\08\00\00\00\00\02\00\08\08\00\00\00\00\02\00\10\08\00\00\00\00\02\00\18\08\00\00\00\00\02\00 \08\00\00\00\00\02\00(\08\00\00\00\00\02\000\08\00\00\00\00\02\008\08\00\00\00\00\00\00\00\14,\00\00\00`\02\00\00\00\00\00\00:\00\00\00\08\00\00\00\E0\01\00\00\00\00\00\009\00\00\00\03\00\00\00\90\01\00\00\00\00\00\008\00\00\00\03\00\00\00P\02\00\00\00\00\00\009\00\00\00\07\00\00\00p\02\00\00\00\00\00\00@\02\00\00\00\00\00\008\00\00\00\07\00\00\00p\02\00\00\00\00\00\00H\00\00\00\00\00\00\00\02\00\00\00\07\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\0C|\00\FF\02\00\00\00pP\F0\0B\00\DA\0F\00G\09\00\00 \01\00\00\00\00\80\03\00\EA\0F\00Vy\00\00\FF\FF\FF\FF\00\00\00\00\00\E8\0F\00U\7F\01\00\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\02\01\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\03\02\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\04\03\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\05\04\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\06\05\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\07\06\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\08\07\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\09\08\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\0A\09\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\0B\0A\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\0C\0B\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\0D\0C\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\0E\0D\00\00\00\00\00\00\10\00\00\E8\0F\00U\7F\0F\0E\00\00\00\00\00\00\10\00\00\E8\0F\00Vy\00\0F\00\00\00\00\00\00\00\00\00\E8\0F\00\82x\02\00\01\00\00\00\00\00\00\00\00\E4\0F\00$v\01\FF\00\0A\00\00\FF\00\8E\07\00\C6\0F\00\19y\02\00\00\00\00\00\00!\00\00\00\22\0E\005t\08\FF\00\00\00\00\FF\01\00\00\00\E2\0F\00\10x\01\01\E8\FF\FF\FF\FF\E0\FF\07\00\E2\0F\00$t\00\FF\02\00\00\00\FF\00\8E\07\00\E2\0F\00\82x\04\00\00\00\00\00\00\00\00\00\00\E2\0F\00$t\09\FF\00\00\08@\FF\00\8E\07\00\E2\0F\00\10z\06\01\00\08\00\00\FF\E0\F1\07\00\E2\0F\00$r\03\FF\FF\00\00\00\FF\00\8E\07\00\E2\0F\00\87s\00\01\00\08\00\00\00\00\10\00\00\E2\03\00\82x\05\00\00\00\00\00\00\00\00\00\00\E2\0F\00\10z\07\FF\00\09\00\00\FF\E4\7F\00\00\E2\0F\00$~\04\FF\04\00\00\00\FF\00\8E\0F\00\E2\0F\00\87s\00\01\08\10\00\00\00\0A\10\00\00\E2\03\00\02|\05\00\05\00\00\00\00\0F\00\08\00\C6\0F\00\87s\00\01\02\00\00\00\00\0A\10\00\00\E8\13\00\02x\14\00\00\00\00\00\00\0F\00\00\00\E4\0F\00\02x\15\00\00\00\00\00\00\0F\00\00\00\C8\0F\00Cy\00\00\00\00\00\00\00\00\C0\03\00\EA/\00My\00\00\00\00\00\00\00\00\80\03\00\EA\0F\00Gy\00\00\F0\FF\FF\FF\FF\FF\83\03\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00\18y\00\00\00\00\00\00\00\00\00\00\00\C0\0F\00Hello from %lld, %d, %f\0A\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\03\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00@\00\00\00\00\00\00\00\E2\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\0B\00\00\00\03\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\22\01\00\00\00\00\00\00\05\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\13\00\00\00\02\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00(\02\00\00\00\00\00\00\D8\00\00\00\00\00\00\00\02\00\00\00\07\00\00\00\08\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00\A3\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\03\00\00\00\00\00\00p\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00)\00\00\00\00\00\00p\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00p\03\00\00\00\00\00\000\00\00\00\00\00\00\00\03\00\00\00\00\00\00\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00>\00\00\00\00\00\00p\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\A0\03\00\00\00\00\00\00(\00\00\00\00\00\00\00\03\00\00\00\0C\00\00\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\D3\00\00\00\0B\00\00p\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\C8\03\00\00\00\00\00\00\D8\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00n\00\00\00\09\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\A0\04\00\00\00\00\00\000\00\00\00\00\00\00\00\03\00\00\00\0C\00\00\00\08\00\00\00\00\00\00\00\10\00\00\00\00\00\00\00~\00\00\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\D0\04\00\00\00\00\00\000\00\00\00\00\00\00\00\03\00\00\00\0C\00\00\00\08\00\00\00\00\00\00\00\18\00\00\00\00\00\00\00\B0\00\00\00\09\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\05\00\00\00\00\00\00\10\00\00\00\00\00\00\00\03\00\00\00\04\00\00\00\08\00\00\00\00\00\00\00\10\00\00\00\00\00\00\00\8F\00\00\00\01\00\00\00\02\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\10\05\00\00\00\00\00\00`\01\00\00\00\00\00\00\00\00\00\00\0C\00\00\00\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\002\00\00\00\01\00\00\00\06\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\80\06\00\00\00\00\00\00\80\03\00\00\00\00\00\00\03\00\00\00\07\00\00\18\80\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00^\00\00\00\01\00\00\00\03\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\0A\00\00\00\00\00\00\19\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01\00\01\01H\00\00\00\88\02\00\00\00\00\00\00\86\02\00\00@\00\00\00\05\00\07\00W\00\00\00\00\00\00\00\00\00\00\00\11 \00\00\00\00\00\00\00\00\00\00\00\00\00\00\06\04\00\00\00\00\00\00\00\00\00\00\00\00\00\00\F0 \0A\0A\0A\0A.version 7.5\0A.target sm_87\0A.address_size 64.\00\F0\0Bextern .func (.param .b32 \12\00\F5\05_retval0) vprintf\0A(\0A$\00$64\16\00\11_\13\00?_0,\1D\00\08\F2\0C1\0A)\0A;\0A.global .align 1 .b8 (\00\F0\08Format_0[25] = {72, 101\05\00\148\05\00Q11, 3\18\00\00\05\00#14\13\002109\18\00)37/\00h00, 44\1B\00\0D\11\00\01M\00 };8\01\F6\0Disible .entry hello()\0A{\0A.loc\B3\00\118\B3\00!__\15\00\F2\02_depot0[24];\0A.reg\FA\00;%SP\0F\00\15L\10\00\9516 %rs<2>\12\00\8932 %r<4>3\00\D3rd<7>;\0A\0Amov.uD\00\1B,w\00b;\0Acvta\9F\00\04%\00\13,n\00\018\00\01Z\00\911, %tid.x/\00\00(\00\03\19\00\10d\1A\00rr1;\0Aadd?\00Brd2,E\00\190\16\00#3,\80\00W0;\0Astq\00\10[\1D\00!],M\00\03t\00\02\E0\00H1, 2,\00\128+\00!+8-\00\14s-\00\02_\00\FF\074, 4613937818241073152k\00\012+16n\00\194A\00+5,D\02\03\1A\01\02m\02\04&\00\116:\00x5;\0A{ \0A\09\A9\02\01\0B\00\02\DB\00\01\0B\00\01\15\00\12[\16\00\22+0q\00=6;\0A3\00\1F13\00\02\1413\0082;\0AP\03\03K\03\C4;\0Acall.uni (^\033, \0A-\03Q, \0A(\0A5\0020, \09\00t1\0A);\0Aldh\00\01\BB\01C2, [=\00\F0\03+0];\0A} \0A\09ret;\0A\0A}\0A\00\00\00">]
  llvm.func @main() {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.mlir.constant(2 : index) : i64
    gpu.launch_func  @kernels::@hello blocks in (%0, %0, %0) threads in (%1, %0, %0) : i64
    llvm.return
  }
}
andfau-amd commented 2 months ago

And here's SYCL.

Test: https://github.com/llvm/llvm-project/blob/111932d5cae0199d9c59669b37232a011f8b8757/mlir/test/Integration/GPU/SYCL/gpu-addf32-to-spirv.mlir

Command line:

../llvm-build/bin/mlir-opt mlir/test/Integration/GPU/SYCL/gpu-addf32-to-spirv.mlir -pass-pipeline='builtin.module(spirv-attach-target{ver=v1.0 caps=Addresses,Int64,Kernel},convert-gpu-to-spirv{use-64bit-index=true},gpu.module(spirv.module(spirv-lower-abi-attrs,spirv-update-vce)),func.func(llvm-request-c-wrappers),convert-scf-to-cf,convert-cf-to-llvm,convert-arith-to-llvm,convert-math-to-llvm,convert-func-to-llvm,gpu-to-llvm{use-bare-pointers-for-kernels=true},gpu-module-to-binary,expand-strided-metadata,lower-affine,finalize-memref-to-llvm,reconcile-unrealized-casts)'

Output:

click to expand ```mlir module @add attributes {gpu.container_module} { llvm.func @malloc(i64) -> !llvm.ptr llvm.mlir.global private constant @__constant_2x2x2xf32_0(dense<[[[1.100000e+00, 2.200000e+00], [3.300000e+00, 4.400000e+00]], [[5.500000e+00, 6.600000e+00], [7.6999998, 8.800000e+00]]]> : tensor<2x2x2xf32>) {addr_space = 0 : i32} : !llvm.array<2 x array<2 x array<2 x f32>>> llvm.mlir.global private constant @__constant_2x2x2xf32(dense<[[[1.200000e+00, 2.300000e+00], [4.500000e+00, 5.800000e+00]], [[7.1999998, 8.300000e+00], [1.050000e+01, 1.180000e+01]]]> : tensor<2x2x2xf32>) {addr_space = 0 : i32} : !llvm.array<2 x array<2 x array<2 x f32>>> llvm.func @main() attributes {llvm.emit_c_interface} { %0 = llvm.mlir.constant(3 : index) : i64 %1 = llvm.mlir.addressof @__constant_2x2x2xf32_0 : !llvm.ptr %2 = llvm.mlir.constant(0 : index) : i64 %3 = llvm.mlir.constant(3735928559 : index) : i64 %4 = llvm.mlir.addressof @__constant_2x2x2xf32 : !llvm.ptr %5 = llvm.mlir.constant(2 : index) : i64 %6 = llvm.mlir.constant(1 : index) : i64 %7 = llvm.mlir.constant(4 : index) : i64 %8 = llvm.getelementptr %4[0, 0, 0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.array<2 x array<2 x array<2 x f32>>> %9 = llvm.inttoptr %3 : i64 to !llvm.ptr %10 = llvm.getelementptr %1[0, 0, 0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.array<2 x array<2 x array<2 x f32>>> %11 = llvm.inttoptr %3 : i64 to !llvm.ptr %12 = llvm.call @test(%9, %8, %2, %5, %5, %5, %7, %5, %6, %11, %10, %2, %5, %5, %5, %7, %5, %6) : (!llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64, i64, i64, !llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64, i64, i64) -> !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %13 = llvm.alloca %6 x !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> : (i64) -> !llvm.ptr llvm.store %12, %13 : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>, !llvm.ptr llvm.call @printMemrefF32(%0, %13) : (i64, !llvm.ptr) -> () llvm.return } llvm.func @_mlir_ciface_main() attributes {llvm.emit_c_interface} { llvm.call @main() : () -> () llvm.return } llvm.func private @printMemrefF32(%arg0: i64, %arg1: !llvm.ptr) attributes {llvm.emit_c_interface, sym_visibility = "private"} { %0 = llvm.mlir.constant(1 : index) : i64 %1 = llvm.mlir.undef : !llvm.struct<(i64, ptr)> %2 = llvm.insertvalue %arg0, %1[0] : !llvm.struct<(i64, ptr)> %3 = llvm.insertvalue %arg1, %2[1] : !llvm.struct<(i64, ptr)> %4 = llvm.alloca %0 x !llvm.struct<(i64, ptr)> : (i64) -> !llvm.ptr llvm.store %3, %4 : !llvm.struct<(i64, ptr)>, !llvm.ptr llvm.call @_mlir_ciface_printMemrefF32(%4) : (!llvm.ptr) -> () llvm.return } llvm.func @_mlir_ciface_printMemrefF32(!llvm.ptr) attributes {llvm.emit_c_interface, sym_visibility = "private"} llvm.func @test(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: !llvm.ptr, %arg10: !llvm.ptr, %arg11: i64, %arg12: i64, %arg13: i64, %arg14: i64, %arg15: i64, %arg16: i64, %arg17: i64) -> !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> attributes {llvm.emit_c_interface} { %0 = llvm.mlir.constant(0 : index) : i64 %1 = llvm.mlir.constant(1 : i8) : i8 %2 = llvm.mlir.zero : !llvm.ptr %3 = llvm.mlir.constant(4 : index) : i64 %4 = llvm.mlir.constant(1 : index) : i64 %5 = llvm.mlir.constant(2 : index) : i64 %6 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %7 = llvm.getelementptr %2[8] : (!llvm.ptr) -> !llvm.ptr, f32 %8 = llvm.ptrtoint %7 : !llvm.ptr to i64 %9 = llvm.call @mgpuMemAlloc(%8, %2, %1) : (i64, !llvm.ptr, i8) -> !llvm.ptr %10 = llvm.mul %arg12, %4 : i64 %11 = llvm.mul %10, %arg13 : i64 %12 = llvm.mul %11, %arg14 : i64 %13 = llvm.getelementptr %2[1] : (!llvm.ptr) -> !llvm.ptr, f32 %14 = llvm.ptrtoint %13 : !llvm.ptr to i64 %15 = llvm.mul %12, %14 : i64 %16 = llvm.getelementptr %arg10[%arg11] : (!llvm.ptr, i64) -> !llvm.ptr, f32 "llvm.intr.memcpy"(%9, %16, %15) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> () %17 = llvm.getelementptr %2[8] : (!llvm.ptr) -> !llvm.ptr, f32 %18 = llvm.ptrtoint %17 : !llvm.ptr to i64 %19 = llvm.call @mgpuMemAlloc(%18, %2, %1) : (i64, !llvm.ptr, i8) -> !llvm.ptr %20 = llvm.mul %arg3, %4 : i64 %21 = llvm.mul %20, %arg4 : i64 %22 = llvm.mul %21, %arg5 : i64 %23 = llvm.getelementptr %2[1] : (!llvm.ptr) -> !llvm.ptr, f32 %24 = llvm.ptrtoint %23 : !llvm.ptr to i64 %25 = llvm.mul %22, %24 : i64 %26 = llvm.getelementptr %arg1[%arg2] : (!llvm.ptr, i64) -> !llvm.ptr, f32 "llvm.intr.memcpy"(%19, %26, %25) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> () %27 = llvm.getelementptr %2[8] : (!llvm.ptr) -> !llvm.ptr, f32 %28 = llvm.ptrtoint %27 : !llvm.ptr to i64 %29 = llvm.call @mgpuMemAlloc(%28, %2, %1) : (i64, !llvm.ptr, i8) -> !llvm.ptr %30 = llvm.call @mgpuStreamCreate() : () -> !llvm.ptr gpu.launch_func <%30 : !llvm.ptr> @test_kernel::@test_kernel blocks in (%5, %5, %5) threads in (%4, %4, %4) : i64 args(%19 : !llvm.ptr, %9 : !llvm.ptr, %29 : !llvm.ptr) llvm.call @mgpuStreamSynchronize(%30) : (!llvm.ptr) -> () llvm.call @mgpuStreamDestroy(%30) : (!llvm.ptr) -> () %31 = llvm.getelementptr %2[8] : (!llvm.ptr) -> !llvm.ptr, f32 %32 = llvm.ptrtoint %31 : !llvm.ptr to i64 %33 = llvm.call @malloc(%32) : (i64) -> !llvm.ptr %34 = llvm.insertvalue %33, %6[0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %35 = llvm.insertvalue %33, %34[1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %36 = llvm.insertvalue %0, %35[2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %37 = llvm.insertvalue %5, %36[3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %38 = llvm.insertvalue %5, %37[3, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %39 = llvm.insertvalue %5, %38[3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %40 = llvm.insertvalue %3, %39[4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %41 = llvm.insertvalue %5, %40[4, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %42 = llvm.insertvalue %4, %41[4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %43 = llvm.mul %4, %5 : i64 %44 = llvm.mul %43, %5 : i64 %45 = llvm.mul %44, %5 : i64 %46 = llvm.getelementptr %2[1] : (!llvm.ptr) -> !llvm.ptr, f32 %47 = llvm.ptrtoint %46 : !llvm.ptr to i64 %48 = llvm.mul %45, %47 : i64 "llvm.intr.memcpy"(%33, %29, %48) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> () %49 = llvm.call @mgpuStreamCreate() : () -> !llvm.ptr llvm.call @mgpuMemFree(%29, %49) : (!llvm.ptr, !llvm.ptr) -> () llvm.call @mgpuMemFree(%19, %49) : (!llvm.ptr, !llvm.ptr) -> () llvm.call @mgpuMemFree(%9, %49) : (!llvm.ptr, !llvm.ptr) -> () llvm.call @mgpuStreamSynchronize(%49) : (!llvm.ptr) -> () llvm.call @mgpuStreamDestroy(%49) : (!llvm.ptr) -> () llvm.return %42 : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> } llvm.func @_mlir_ciface_test(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {llvm.emit_c_interface} { %0 = llvm.load %arg1 : !llvm.ptr -> !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %1 = llvm.extractvalue %0[0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %2 = llvm.extractvalue %0[1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %3 = llvm.extractvalue %0[2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %4 = llvm.extractvalue %0[3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %5 = llvm.extractvalue %0[3, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %6 = llvm.extractvalue %0[3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %7 = llvm.extractvalue %0[4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %8 = llvm.extractvalue %0[4, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %9 = llvm.extractvalue %0[4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %10 = llvm.load %arg2 : !llvm.ptr -> !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %11 = llvm.extractvalue %10[0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %12 = llvm.extractvalue %10[1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %13 = llvm.extractvalue %10[2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %14 = llvm.extractvalue %10[3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %15 = llvm.extractvalue %10[3, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %16 = llvm.extractvalue %10[3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %17 = llvm.extractvalue %10[4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %18 = llvm.extractvalue %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %19 = llvm.extractvalue %10[4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> %20 = llvm.call @test(%1, %2, %3, %4, %5, %6, %7, %8, %9, %11, %12, %13, %14, %15, %16, %17, %18, %19) : (!llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64, i64, i64, !llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64, i64, i64) -> !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> llvm.store %20, %arg0 : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>, !llvm.ptr llvm.return } gpu.binary @test_kernel [#gpu.object<#spirv.target_env<#spirv.vce, #spirv.resource_limits<>>, "\03\02#\07\00\00\01\00\14\00\16\00.\00\00\00\00\00\00\00\11\00\02\00\0B\00\00\00\11\00\02\00\06\00\00\00\11\00\02\00\04\00\00\00\0E\00\03\00\02\00\00\00\02\00\00\00\0F\00\07\00\06\00\00\00\0C\00\00\00test_kernel\00\04\00\00\00\05\00\09\00\04\00\00\00__builtin__WorkgroupId__\00\00\00\00\05\00\05\00\0C\00\00\00test_kernel\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00\15\00\04\00\03\00\00\00@\00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00\13\00\02\00\06\00\00\00\16\00\03\00\09\00\00\00 \00\00\00\15\00\04\00\0A\00\00\00 \00\00\00\00\00\00\00+\00\04\00\0A\00\00\00\0B\00\00\00\08\00\00\00\1C\00\04\00\08\00\00\00\09\00\00\00\0B\00\00\00 \00\04\00\07\00\00\00\05\00\00\00\08\00\00\00!\00\06\00\05\00\00\00\06\00\00\00\07\00\00\00\07\00\00\00\07\00\00\00+\00\05\00\03\00\00\00\17\00\00\00\00\00\00\00\00\00\00\00+\00\05\00\03\00\00\00\18\00\00\00\04\00\00\00\00\00\00\00+\00\05\00\03\00\00\00\1A\00\00\00\02\00\00\00\00\00\00\00+\00\05\00\03\00\00\00\1D\00\00\00\01\00\00\00\00\00\00\00 \00\04\00\1F\00\00\00\05\00\00\00\09\00\00\006\00\05\00\06\00\00\00\0C\00\00\00\00\00\00\00\05\00\00\007\00\03\00\07\00\00\00\0D\00\00\007\00\03\00\07\00\00\00\0E\00\00\007\00\03\00\07\00\00\00\0F\00\00\00\F8\00\02\00\10\00\00\00=\00\04\00\02\00\00\00\11\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\12\00\00\00\11\00\00\00\00\00\00\00=\00\04\00\02\00\00\00\13\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\14\00\00\00\13\00\00\00\01\00\00\00=\00\04\00\02\00\00\00\15\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\16\00\00\00\15\00\00\00\02\00\00\00\84\00\05\00\03\00\00\00\19\00\00\00\12\00\00\00\18\00\00\00\84\00\05\00\03\00\00\00\1B\00\00\00\14\00\00\00\1A\00\00\00\80\00\05\00\03\00\00\00\1C\00\00\00\1B\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00\1E\00\00\00\16\00\00\00\1C\00\00\00A\00\05\00\1F\00\00\00 \00\00\00\0D\00\00\00\1E\00\00\00=\00\04\00\09\00\00\00!\00\00\00 \00\00\00\84\00\05\00\03\00\00\00\22\00\00\00\12\00\00\00\18\00\00\00\84\00\05\00\03\00\00\00#\00\00\00\14\00\00\00\1A\00\00\00\80\00\05\00\03\00\00\00$\00\00\00#\00\00\00\22\00\00\00\80\00\05\00\03\00\00\00%\00\00\00\16\00\00\00$\00\00\00A\00\05\00\1F\00\00\00&\00\00\00\0E\00\00\00%\00\00\00=\00\04\00\09\00\00\00'\00\00\00&\00\00\00\81\00\05\00\09\00\00\00(\00\00\00!\00\00\00'\00\00\00\84\00\05\00\03\00\00\00)\00\00\00\12\00\00\00\18\00\00\00\84\00\05\00\03\00\00\00*\00\00\00\14\00\00\00\1A\00\00\00\80\00\05\00\03\00\00\00+\00\00\00*\00\00\00)\00\00\00\80\00\05\00\03\00\00\00,\00\00\00\16\00\00\00+\00\00\00A\00\05\00\1F\00\00\00-\00\00\00\0F\00\00\00,\00\00\00>\00\03\00-\00\00\00(\00\00\00\FD\00\01\008\00\01\00">] llvm.func @mgpuMemAlloc(i64, !llvm.ptr, i8) -> !llvm.ptr llvm.func @mgpuStreamCreate() -> !llvm.ptr llvm.func @mgpuStreamSynchronize(!llvm.ptr) llvm.func @mgpuStreamDestroy(!llvm.ptr) llvm.func @mgpuMemFree(!llvm.ptr, !llvm.ptr) } ```

This one was very straightforward to get working, no runtime needed. And they're already doing serialized SPIR-V with spirv-opt here, so probably a significant about of code reuse or at least inspiration can be done here. :)

andfau-amd commented 1 month ago

First piece of the mlir-spirv-cpu-runner work: https://github.com/llvm/llvm-project/pull/111575

andfau-amd commented 1 month ago

Last(?) piece of the mlir-spirv-cpu-runner work: https://github.com/llvm/llvm-project/pull/114563

andfau-amd commented 3 weeks ago

Make spirv serialization available to mlir-opt. See the same subtask above.

If we want to do this (for the SPIR-V Runner), then https://github.com/llvm/llvm-project/issues/115348 is for tracking it. I know it's part of the outline from this ticket, but I'm only inclined to do it for the SPIR-V Runner if it somehow makes the Vulkan Runner part of this easier, I guess we'll see.

andfau-amd commented 3 weeks ago

I realised that the Vulkan runner is already serializing the SPIR-V binary and embedding it in MLIR, it's just presumably not producing quite the shape of IR that we would like it to have. So I decided to dump an example of an existing test (by hacking the runner to do module.dump() after running the MLIR pipeline) for comparison to dumps from the other runners.

../llvm-build/bin/mlir-vulkan-runner mlir/test/mlir-vulkan-runner/addf.mlir --shared-libs=../llvm-build/lib/libvulkan-runtime-wrappers.so,../llvm-build/lib/libmlir_runner_utils.so --entry-point-result=void
MLIR module (click to expand) ```mlir module attributes {gpu.container_module, spirv.target_env = #spirv.target_env<#spirv.vce, #spirv.resource_limits<>>} { llvm.mlir.global internal constant @kernel_add_spv_entry_point_name("kernel_add\00") {addr_space = 0 : i32} llvm.mlir.global internal constant @SPIRV_BIN("\03\02#\07\00\00\01\00\14\00\16\00\1C\00\00\00\00\00\00\00\11\00\02\00\01\00\00\00\0A\00\0B\00SPV_KHR_storage_buffer_storage_class\00\00\00\00\0E\00\03\00\00\00\00\00\01\00\00\00\0F\00\07\00\05\00\00\00\0F\00\00\00kernel_add\00\00\04\00\00\00\10\00\06\00\0F\00\00\00\11\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\05\00\09\00\04\00\00\00__builtin__WorkgroupId__\00\00\00\00\05\00\07\00\0A\00\00\00kernel_add_arg_0\00\00\00\00\05\00\07\00\0B\00\00\00kernel_add_arg_1\00\00\00\00\05\00\07\00\0C\00\00\00kernel_add_arg_2\00\00\00\00\05\00\05\00\0F\00\00\00kernel_add\00\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00G\00\04\00\07\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\06\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\06\00\00\00\02\00\00\00G\00\04\00\0A\00\00\00!\00\00\00\00\00\00\00G\00\04\00\0A\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0B\00\00\00!\00\00\00\01\00\00\00G\00\04\00\0B\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0C\00\00\00!\00\00\00\02\00\00\00G\00\04\00\0C\00\00\00\22\00\00\00\00\00\00\00\15\00\04\00\03\00\00\00 \00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00\16\00\03\00\08\00\00\00 \00\00\00+\00\04\00\03\00\00\00\09\00\00\00\08\00\00\00\1C\00\04\00\07\00\00\00\08\00\00\00\09\00\00\00\1E\00\03\00\06\00\00\00\07\00\00\00 \00\04\00\05\00\00\00\0C\00\00\00\06\00\00\00;\00\04\00\05\00\00\00\0A\00\00\00\0C\00\00\00;\00\04\00\05\00\00\00\0B\00\00\00\0C\00\00\00;\00\04\00\05\00\00\00\0C\00\00\00\0C\00\00\00\13\00\02\00\0E\00\00\00!\00\03\00\0D\00\00\00\0E\00\00\00+\00\04\00\03\00\00\00\13\00\00\00\00\00\00\00+\00\04\00\03\00\00\00\14\00\00\00\01\00\00\00 \00\04\00\15\00\00\00\0C\00\00\00\08\00\00\006\00\05\00\0E\00\00\00\0F\00\00\00\00\00\00\00\0D\00\00\00\F8\00\02\00\10\00\00\00=\00\04\00\02\00\00\00\11\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\12\00\00\00\11\00\00\00\00\00\00\00A\00\06\00\15\00\00\00\16\00\00\00\0A\00\00\00\13\00\00\00\12\00\00\00=\00\04\00\08\00\00\00\17\00\00\00\16\00\00\00A\00\06\00\15\00\00\00\18\00\00\00\0B\00\00\00\13\00\00\00\12\00\00\00=\00\04\00\08\00\00\00\19\00\00\00\18\00\00\00\81\00\05\00\08\00\00\00\1A\00\00\00\17\00\00\00\19\00\00\00A\00\06\00\15\00\00\00\1B\00\00\00\0C\00\00\00\13\00\00\00\12\00\00\00>\00\03\00\1B\00\00\00\1A\00\00\00\FD\00\01\008\00\01\00") {addr_space = 0 : i32} llvm.func @malloc(i64) -> !llvm.ptr llvm.func @main() attributes {llvm.emit_c_interface} { %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr)> %1 = llvm.mlir.constant(0 : index) : i64 %2 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %3 = llvm.mlir.constant(8 : index) : i64 %4 = llvm.mlir.constant(1 : index) : i64 %5 = llvm.mlir.constant(2.200000e+00 : f32) : f32 %6 = llvm.mlir.constant(1.100000e+00 : f32) : f32 %7 = llvm.mlir.constant(0.000000e+00 : f32) : f32 %8 = llvm.mlir.constant(8 : index) : i64 %9 = llvm.mlir.constant(1 : index) : i64 %10 = llvm.mlir.zero : !llvm.ptr %11 = llvm.getelementptr %10[8] : (!llvm.ptr) -> !llvm.ptr, f32 %12 = llvm.ptrtoint %11 : !llvm.ptr to i64 %13 = llvm.call @malloc(%12) : (i64) -> !llvm.ptr %14 = llvm.insertvalue %13, %2[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %15 = llvm.insertvalue %13, %14[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %16 = llvm.insertvalue %1, %15[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %17 = llvm.insertvalue %8, %16[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %18 = llvm.insertvalue %9, %17[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %19 = llvm.getelementptr %10[8] : (!llvm.ptr) -> !llvm.ptr, f32 %20 = llvm.ptrtoint %19 : !llvm.ptr to i64 %21 = llvm.call @malloc(%20) : (i64) -> !llvm.ptr %22 = llvm.insertvalue %21, %2[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %23 = llvm.insertvalue %21, %22[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %24 = llvm.insertvalue %1, %23[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %25 = llvm.insertvalue %8, %24[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %26 = llvm.insertvalue %9, %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %27 = llvm.getelementptr %10[8] : (!llvm.ptr) -> !llvm.ptr, f32 %28 = llvm.ptrtoint %27 : !llvm.ptr to i64 %29 = llvm.call @malloc(%28) : (i64) -> !llvm.ptr %30 = llvm.insertvalue %29, %2[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %31 = llvm.insertvalue %29, %30[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %32 = llvm.insertvalue %1, %31[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %33 = llvm.insertvalue %8, %32[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %34 = llvm.insertvalue %9, %33[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %35 = llvm.extractvalue %18[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %36 = llvm.extractvalue %18[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %37 = llvm.extractvalue %18[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %38 = llvm.extractvalue %18[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %39 = llvm.extractvalue %18[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> llvm.call @fillResource1DFloat(%35, %36, %37, %38, %39, %6) : (!llvm.ptr, !llvm.ptr, i64, i64, i64, f32) -> () %40 = llvm.extractvalue %26[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %41 = llvm.extractvalue %26[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %42 = llvm.extractvalue %26[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %43 = llvm.extractvalue %26[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %44 = llvm.extractvalue %26[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> llvm.call @fillResource1DFloat(%40, %41, %42, %43, %44, %5) : (!llvm.ptr, !llvm.ptr, i64, i64, i64, f32) -> () %45 = llvm.extractvalue %34[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %46 = llvm.extractvalue %34[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %47 = llvm.extractvalue %34[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %48 = llvm.extractvalue %34[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %49 = llvm.extractvalue %34[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> llvm.call @fillResource1DFloat(%45, %46, %47, %48, %49, %7) : (!llvm.ptr, !llvm.ptr, i64, i64, i64, f32) -> () %50 = llvm.extractvalue %18[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %51 = llvm.extractvalue %18[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %52 = llvm.extractvalue %18[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %53 = llvm.extractvalue %18[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %54 = llvm.extractvalue %18[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %55 = llvm.extractvalue %26[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %56 = llvm.extractvalue %26[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %57 = llvm.extractvalue %26[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %58 = llvm.extractvalue %26[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %59 = llvm.extractvalue %26[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %60 = llvm.extractvalue %34[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %61 = llvm.extractvalue %34[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %62 = llvm.extractvalue %34[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %63 = llvm.extractvalue %34[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %64 = llvm.extractvalue %34[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> llvm.call @vulkanLaunch(%3, %4, %4, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59, %60, %61, %62, %63, %64) {spirv_blob = "\03\02#\07\00\00\01\00\14\00\16\00\1C\00\00\00\00\00\00\00\11\00\02\00\01\00\00\00\0A\00\0B\00SPV_KHR_storage_buffer_storage_class\00\00\00\00\0E\00\03\00\00\00\00\00\01\00\00\00\0F\00\07\00\05\00\00\00\0F\00\00\00kernel_add\00\00\04\00\00\00\10\00\06\00\0F\00\00\00\11\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\05\00\09\00\04\00\00\00__builtin__WorkgroupId__\00\00\00\00\05\00\07\00\0A\00\00\00kernel_add_arg_0\00\00\00\00\05\00\07\00\0B\00\00\00kernel_add_arg_1\00\00\00\00\05\00\07\00\0C\00\00\00kernel_add_arg_2\00\00\00\00\05\00\05\00\0F\00\00\00kernel_add\00\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00G\00\04\00\07\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\06\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\06\00\00\00\02\00\00\00G\00\04\00\0A\00\00\00!\00\00\00\00\00\00\00G\00\04\00\0A\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0B\00\00\00!\00\00\00\01\00\00\00G\00\04\00\0B\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0C\00\00\00!\00\00\00\02\00\00\00G\00\04\00\0C\00\00\00\22\00\00\00\00\00\00\00\15\00\04\00\03\00\00\00 \00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00\16\00\03\00\08\00\00\00 \00\00\00+\00\04\00\03\00\00\00\09\00\00\00\08\00\00\00\1C\00\04\00\07\00\00\00\08\00\00\00\09\00\00\00\1E\00\03\00\06\00\00\00\07\00\00\00 \00\04\00\05\00\00\00\0C\00\00\00\06\00\00\00;\00\04\00\05\00\00\00\0A\00\00\00\0C\00\00\00;\00\04\00\05\00\00\00\0B\00\00\00\0C\00\00\00;\00\04\00\05\00\00\00\0C\00\00\00\0C\00\00\00\13\00\02\00\0E\00\00\00!\00\03\00\0D\00\00\00\0E\00\00\00+\00\04\00\03\00\00\00\13\00\00\00\00\00\00\00+\00\04\00\03\00\00\00\14\00\00\00\01\00\00\00 \00\04\00\15\00\00\00\0C\00\00\00\08\00\00\006\00\05\00\0E\00\00\00\0F\00\00\00\00\00\00\00\0D\00\00\00\F8\00\02\00\10\00\00\00=\00\04\00\02\00\00\00\11\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\12\00\00\00\11\00\00\00\00\00\00\00A\00\06\00\15\00\00\00\16\00\00\00\0A\00\00\00\13\00\00\00\12\00\00\00=\00\04\00\08\00\00\00\17\00\00\00\16\00\00\00A\00\06\00\15\00\00\00\18\00\00\00\0B\00\00\00\13\00\00\00\12\00\00\00=\00\04\00\08\00\00\00\19\00\00\00\18\00\00\00\81\00\05\00\08\00\00\00\1A\00\00\00\17\00\00\00\19\00\00\00A\00\06\00\15\00\00\00\1B\00\00\00\0C\00\00\00\13\00\00\00\12\00\00\00>\00\03\00\1B\00\00\00\1A\00\00\00\FD\00\01\008\00\01\00", spirv_element_types = [f32, f32, f32], spirv_entry_point = "kernel_add"} : (i64, i64, i64, !llvm.ptr, !llvm.ptr, i64, i64, i64, !llvm.ptr, !llvm.ptr, i64, i64, i64, !llvm.ptr, !llvm.ptr, i64, i64, i64) -> () %65 = llvm.alloca %9 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr llvm.store %34, %65 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr %66 = llvm.insertvalue %9, %0[0] : !llvm.struct<(i64, ptr)> %67 = llvm.insertvalue %65, %66[1] : !llvm.struct<(i64, ptr)> %68 = llvm.extractvalue %67[0] : !llvm.struct<(i64, ptr)> %69 = llvm.extractvalue %67[1] : !llvm.struct<(i64, ptr)> llvm.call @printMemrefF32(%68, %69) : (i64, !llvm.ptr) -> () llvm.return } llvm.func @_mlir_ciface_main() attributes {llvm.emit_c_interface} { llvm.call @main() : () -> () llvm.return } llvm.func private @fillResource1DFloat(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: f32) attributes {llvm.emit_c_interface, sym_visibility = "private"} { %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %6 = llvm.mlir.constant(1 : index) : i64 %7 = llvm.alloca %6 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr llvm.store %5, %7 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr llvm.call @_mlir_ciface_fillResource1DFloat(%7, %arg5) : (!llvm.ptr, f32) -> () llvm.return } llvm.func @_mlir_ciface_fillResource1DFloat(!llvm.ptr, f32) attributes {llvm.emit_c_interface, sym_visibility = "private"} llvm.func private @printMemrefF32(%arg0: i64, %arg1: !llvm.ptr) attributes {llvm.emit_c_interface, sym_visibility = "private"} { %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr)> %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr)> %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr)> %3 = llvm.mlir.constant(1 : index) : i64 %4 = llvm.alloca %3 x !llvm.struct<(i64, ptr)> : (i64) -> !llvm.ptr llvm.store %2, %4 : !llvm.struct<(i64, ptr)>, !llvm.ptr llvm.call @_mlir_ciface_printMemrefF32(%4) : (!llvm.ptr) -> () llvm.return } llvm.func @_mlir_ciface_printMemrefF32(!llvm.ptr) attributes {llvm.emit_c_interface, sym_visibility = "private"} llvm.func private @vulkanLaunch(%arg0: i64, %arg1: i64, %arg2: i64, %arg3: !llvm.ptr, %arg4: !llvm.ptr, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: !llvm.ptr, %arg9: !llvm.ptr, %arg10: i64, %arg11: i64, %arg12: i64, %arg13: !llvm.ptr, %arg14: !llvm.ptr, %arg15: i64, %arg16: i64, %arg17: i64) attributes {llvm.emit_c_interface, sym_visibility = "private"} { %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %1 = llvm.insertvalue %arg3, %0[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %2 = llvm.insertvalue %arg4, %1[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %3 = llvm.insertvalue %arg5, %2[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %4 = llvm.insertvalue %arg6, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %5 = llvm.insertvalue %arg7, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %6 = llvm.mlir.constant(1 : index) : i64 %7 = llvm.alloca %6 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr llvm.store %5, %7 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %9 = llvm.insertvalue %arg8, %8[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %10 = llvm.insertvalue %arg9, %9[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %11 = llvm.insertvalue %arg10, %10[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %12 = llvm.insertvalue %arg11, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %14 = llvm.mlir.constant(1 : index) : i64 %15 = llvm.alloca %14 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr llvm.store %13, %15 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %17 = llvm.insertvalue %arg13, %16[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %18 = llvm.insertvalue %arg14, %17[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %19 = llvm.insertvalue %arg15, %18[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %20 = llvm.insertvalue %arg16, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %21 = llvm.insertvalue %arg17, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> %22 = llvm.mlir.constant(1 : index) : i64 %23 = llvm.alloca %22 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr llvm.store %21, %23 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr %24 = llvm.call @initVulkan() : () -> !llvm.ptr %25 = llvm.mlir.addressof @SPIRV_BIN : !llvm.ptr %26 = llvm.getelementptr %25[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.array<896 x i8> %27 = llvm.mlir.constant(896 : i32) : i32 %28 = llvm.mlir.constant(0 : i32) : i32 %29 = llvm.mlir.constant(0 : i32) : i32 llvm.call @bindMemRef1DFloat(%24, %28, %29, %7) : (!llvm.ptr, i32, i32, !llvm.ptr) -> () %30 = llvm.mlir.constant(1 : i32) : i32 llvm.call @bindMemRef1DFloat(%24, %28, %30, %15) : (!llvm.ptr, i32, i32, !llvm.ptr) -> () %31 = llvm.mlir.constant(2 : i32) : i32 llvm.call @bindMemRef1DFloat(%24, %28, %31, %23) : (!llvm.ptr, i32, i32, !llvm.ptr) -> () llvm.call @setBinaryShader(%24, %26, %27) : (!llvm.ptr, !llvm.ptr, i32) -> () %32 = llvm.mlir.addressof @kernel_add_spv_entry_point_name : !llvm.ptr %33 = llvm.getelementptr %32[0, 0] : (!llvm.ptr) -> !llvm.ptr, !llvm.array<11 x i8> llvm.call @setEntryPoint(%24, %33) : (!llvm.ptr, !llvm.ptr) -> () llvm.call @setNumWorkGroups(%24, %arg0, %arg1, %arg2) : (!llvm.ptr, i64, i64, i64) -> () llvm.call @runOnVulkan(%24) : (!llvm.ptr) -> () llvm.call @deinitVulkan(%24) : (!llvm.ptr) -> () llvm.return } llvm.func @_mlir_ciface_vulkanLaunch(i64, i64, i64, !llvm.ptr, !llvm.ptr, !llvm.ptr) attributes {llvm.emit_c_interface, sym_visibility = "private"} llvm.func @setEntryPoint(!llvm.ptr, !llvm.ptr) llvm.func @setNumWorkGroups(!llvm.ptr, i64, i64, i64) llvm.func @setBinaryShader(!llvm.ptr, !llvm.ptr, i32) llvm.func @runOnVulkan(!llvm.ptr) llvm.func @bindMemRef1DFloat(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef1DInt32(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef1DInt16(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef1DInt8(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef1DHalf(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef2DFloat(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef2DInt32(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef2DInt16(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef2DInt8(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef2DHalf(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef3DFloat(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef3DInt32(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef3DInt16(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef3DInt8(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @bindMemRef3DHalf(!llvm.ptr, i32, i32, !llvm.ptr) llvm.func @initVulkan() -> !llvm.ptr llvm.func @deinitVulkan(!llvm.ptr) } ```

Looks like it uses some SPIRV_BIN constant instead of the desired gpu.binary. Interesting.

andfau-amd commented 3 weeks ago

Ah, and that's after the pass added with createConvertVulkanLaunchFuncToVulkanCallsPass(). If that pass is skipped, the binary blob is instead embedded in a line looking like llvm.call @vulkanLaunch(%3, %4, %4, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59, %60, %61, %62, %63, %64) {spirv_blob = ".

andfau-amd commented 3 weeks ago

Oh, if I comment out everything starting with and following createConvertGpuLaunchFuncToVulkanLaunchFuncPass(), then the IR starts to look more like what we want, including the use of gpu.launch_func. But it doesn't have a SPIR-V blob. Okay, maybe I'm starting to understand this better. I need to compare the pass pipeline with SYCL's I suppose.

andfau-amd commented 3 weeks ago

Aha okay, so:

So, I guess the focus of this whole project is to replace the former with the latter. All the other changes will be essentially supporting infrastructure for this change.