iree-org / iree-nvgpu

Apache License 2.0
48 stars 19 forks source link

[RFC] Integration with cuDNN via IREE compiler/runtime plugins #12

Open ezhulenev opened 1 year ago

ezhulenev commented 1 year ago

One of the initial goals of openxla-nvgpu plugin is to show how to integrate NVIDIA libraries with the IREE compiler/runtime. The work is already in progress, and few PRs are merged. The design of this integration is outlined in this document: https://docs.google.com/document/d/1WzSH7LdQdL1CQmlIOUyy6auDiX6d3cl5LAzZU_I4KCY/edit#

chsigg commented 1 year ago

Thanks Eugene for sharing the doc, it looks like a solid plan.

I will try to cover some things that are not discussed in the doc here.

Input dialect

We initially started with lowering from mhlo, but switched to stablehlo now. The two dialects are mostly the same, so this shouldn't be difficult, but we will likely need to update the resnet50 model where we want to show the performance advantage of using libraries compared to the code that IREE is currently able to generate.

Do I understand correctly that the various IREE importers are targeting StableHLO and the standard IREE pipeline is able to consume this?

Layout assignment

This is currently being worked out on the IREE side. My very high level thinking is that it will provide an external layout interface that can be injected to StableHLO ops to communicate preferred layouts. We could then inject the same interfaces to cuDNN ops to constraint the layouts to what cuDNN expects.

Cost model

We will implement a model that determines the cost of StableHLO ops and their cuDNN graph equivalent. This will determine which subgraphs are outlined to cuDNN ops. This needs to take downstream fusion opportunities into account, because the performance profiles of a fused vs unfused op are vastly different.

Compilation pipeline

I need to leave this as a placeholder, because I haven't looked into it yet. But we should document our requirements for hooking into the IREE compilation pipeline. The main open issue here is when/how we perform the outlining of StableHLO ops and lowering to cuDNN ops. As far as I know, the downstream path from cuDNN ops is already working.