nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
64 stars 29 forks source link

[Tracking] Refactor MLIR-AIE dependency #430

Closed makslevental closed 1 month ago

makslevental commented 3 months ago

TL;DR: This issue tracks ongoing work to refactor the dependency on MLIR-AIE

Description:

The goal of this sprint of work is to remove the dependency on AIE and AIEX dialects completely (AIEVec will remain). This includes artifact generation. Thus, one final deliverable is a build configuration (of this repo/plugin) that does not need to clone/build Xilinx/mlir-aie at all (for platforms/deployments that don't need AIEVec). The "business" goal is a more stable device configuration and runtime layer/experience for higher level dialects.

Development Plan

The work proceeds in roughly two necessary phases (for the MVP) and planned extension work:

  1. Vendoring/interning the relevant passes/parts of mlir-aie;
    • aie-rt (the actual device configuration utility underneath mlir-aie) and bootgen (necessary for artifact construction) submodules;
    • XCLBinGen;
    • A minimal subset of AIE/AIEX passes necessary for supporting NPU.
      • aie-assign-lock-ids, aie-assign-buffer-descriptor-ids, aie-assign-buffer-addresses-basic, aie-pathfinder, aie-localize-locks, aie-objectstateful-transform, aiex-dma-to-npu;
      • Note, of these 7 passes, only aie-pathfinder and aie-objectstateful-transform perform any analysis
    • A minimal subset of translation utilities;
      • aie-translate-cdo, aie-translate-bcf, aie-translate-ld-script, aie-translate-npu.
  2. Refactor/merge/clean/DCE;
    • Move all calls to aie-rt into a iree_aie_runtime library which emulates the various mlir_*_runtime libs upstream (but still is only called at compile time; see planned work below);
    • Merge all non-analysis passes into a single pass;
    • Simplify aie-pathfinder (remove legacy d_ary_heap.h) and aie-objectstateful-transform;
    • Re-design AIETargetModel as AMDAIEDeviceModel and base the latter on aie-rt (i.e., use aie-rt APIs for querying relevant device attributes/characteristics);
    • Re-design CDO emission to consume objectfifo directly instead of buffer, dma, switchbox etc;
      • This step eliminates all passes that transform those objects.

At this point we will have a completely unified/self-contained path from aie.objectfifo to .xclbin (modulo chess/peano) but we will still have a dependency on AIE dialect for the aie.objectfifo op. The immediate next step is to connect directly to amdaie.logicalobjectfifo in order to complete/reach the goal of removing the dependency on AIE/AIEX dialects (i.e., headers, libs, etc.). Because this last step is subject to progress on work involving amdaie.logicalobjectfifo, in fact all of the prior mentioned work will happen in a parallel lowering path through an ephemeral #hal.device.target<"amd-aie-direct", [#hal.executable.target<"amd-aie-direct">]>. Once the direct connection to amdaie.logicalobjectfifo is complete amd-aie-direct takes over as the only hal.

Regarding, AIEVec dialect: the AIEVec dialect (used for emitting vector intrinsics targeting the single cores) does not depend on AIE/AIEX dialects and thus we can continue to keep it as a dependency.

Planned extension/further work includes:

  1. Reducing the friction between iree-amd-aie (this repo/plugin) and the single core compilers (i.e., chess and peano) by removing most of the "shell out";
    • Currently translation to LLVM IR (.ll) for chess includes a chesshack step that rewrites present day LLVM IR to chess's version (15). Alternatively (I have verified this) we can emit llvm dialect (MLIR IR) and translate it to LLVM IR using mlir-translate-15 i.e., the version of mlir-translate built against llvmorg-15.0.7.
    • This enables us to not only remove chesshack but furthermore link directly against MLIRTranslateLib built against the same tag[^1]. I have verified this as well;
    • Same follows for peano but even moreso because we can directly link the single-core codegen libs (see foonote[^1]);
    • Shell outs to xclbinutil and bootgen can also immediately be eliminated by simply making direct API calls into those libs (see xaiepy as a proof of this concept);
    • This reduces the number of shell-outs to just one: chesscc.
  2. Once we are able to fully control emitting device configurations/instructions/code (i.e., what aie-rt actually emits) we are (mostly) free to move to a more conventional model of dispatch;
    • Calls to aie-rt can be "inlined" into the mid-level IR itself just as is done for the various GPU dialects, i.e., iree_aie_runtime can become a true runtime library;
      • The extent to which this is feasible (what can be configured/reconfigured outside of a CDO) is determined by both the firmware and the driver but I have verified that there are some objects that can be configured at runtime (shim DMAs). Thus, this work will involve expanding that set of objects.

Testing Plan

Each step will be unit tested using the canonical ground-truth source: mlir-aie/test. I.e., in the intermediate phases steps, we vendor relevant tests in addition to code. In addition, at the phase that it becomes feasible (after the completion of vendoring) each step will be tested E2E i.e., artifact generation and testing for numerical accuracy. Prior to connecting to amdaie.logicalobjectfifo we generate such executable starting from aie.objectfifo (using mlir-aie examples). After connecting to amdaie.logicalobjectfifo we are free to use all of our own E2E tests.

Current progress

Of the initial (MVP) work only the final step remains (redesigning CDO emission to consume aie.objectfifo). Timeline for this final step is ~1 week.

Questions/comments/concerns

How/what/where/when questions are more than welcome here; why questions should be kept for 1-1/team meetings.

cc @stellaraccident @MaheshRavishankar @powderluv @jtuyls @kumardeepakamd @yzhang93 @newling @Abhishek-Varma @nirvedhmeshram @daveliddell

[^1]: By using a small trick to create "versioned namespaces": -DCMAKE_CXX_FLAGS="-Dmlir=mlir15 -Dllvm=llvm15".

makslevental commented 2 months ago

The remaining AIE files we still depend on (after https://github.com/nod-ai/iree-amd-aie/pull/546)