microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
948 stars 158 forks source link

Question about rKernels #442

Closed ArmageddonKnight closed 2 years ago

ArmageddonKnight commented 2 years ago

Hi,

Thanks for open-sourcing this wonderful project. I notice that in the OSDI 2020 paper you mentioned that Rammer can have multiple implementations for the same operator on NVIDIA GPUs and offered matrix multiply as an example, but I notice that Rammer directly invokes the cuBLAS vendor library for executing matrix multiplies. May I know whether Rammer has any alternative matrix multiply implementations to select at runtime? Could you please also give some other examples on different rKernels in Rammer that belong to the same rOperator?

xysmlx commented 2 years ago

Hi, NNFusion supports different rKernels for a rOperator via KernelEmitter. You may see different KernelEmitters for some operators (e.g., there are cudnn-based kernel and manually-implemented kernel for BatchNorm).

NNFusion also implemented a kernel DB mechanism, but kernels should be injected to the kernel DB for selection. The artifact shows different kernels for Conv2D and MatMul (e.g., MatMul kernels tuned from TVM (minimized-time version and resource-efficient version) and manual kernels).