Open ezhulenev opened 1 year ago
Thanks Eugene for sharing the doc, it looks like a solid plan.
I will try to cover some things that are not discussed in the doc here.
We initially started with lowering from mhlo
, but switched to stablehlo
now. The two dialects are mostly the same, so this shouldn't be difficult, but we will likely need to update the resnet50 model where we want to show the performance advantage of using libraries compared to the code that IREE is currently able to generate.
Do I understand correctly that the various IREE importers are targeting StableHLO and the standard IREE pipeline is able to consume this?
This is currently being worked out on the IREE side. My very high level thinking is that it will provide an external layout interface that can be injected to StableHLO ops to communicate preferred layouts. We could then inject the same interfaces to cuDNN ops to constraint the layouts to what cuDNN expects.
We will implement a model that determines the cost of StableHLO ops and their cuDNN graph equivalent. This will determine which subgraphs are outlined to cuDNN ops. This needs to take downstream fusion opportunities into account, because the performance profiles of a fused vs unfused op are vastly different.
I need to leave this as a placeholder, because I haven't looked into it yet. But we should document our requirements for hooking into the IREE compilation pipeline. The main open issue here is when/how we perform the outlining of StableHLO ops and lowering to cuDNN ops. As far as I know, the downstream path from cuDNN ops is already working.
One of the initial goals of
openxla-nvgpu
plugin is to show how to integrate NVIDIA libraries with the IREE compiler/runtime. The work is already in progress, and few PRs are merged. The design of this integration is outlined in this document: https://docs.google.com/document/d/1WzSH7LdQdL1CQmlIOUyy6auDiX6d3cl5LAzZU_I4KCY/edit#