iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.48k stars 553 forks source link

Implicit GEMM support #13641

Open allieculp opened 1 year ago

allieculp commented 1 year ago

New Epic item to track Implicit GEMM work. The tasks here are generally listed so that later tasks in a block depend on previous ones.

## Common Tasks
- [ ] #13541
- [ ] #13627
- [ ] #13449
- [ ] Enable basic software pipelining (i.e. allow the pattern to apply with the gather) @qedawkins 
- [ ] Handle padding on the img2col op for unaligned convolution (up for grabs)
- [ ] Fuse with model-level padding on the input "image" common with convolution  (larger task/up for grabs)
- [ ] Pick a set of baseline "testing/reference" convolutions (up for grabs)
- [ ] Run against cuDNN and CUTLASS (up for grabs)
- [ ] Add convolution to dispatch profiler (up for grabs)
- [ ] #13743
## General Testing/Benchmarking
- [ ] Pick a set of baseline "testing/reference" convolutions (up for grabs)
- [ ] Add convolution to dispatch profiler (up for grabs)
- [ ] Run against cuDNN and CUTLASS (up for grabs)
## CUDA Specific Tasks
- [ ] Asynchronous copy support for deeper pipelines. We see a `vector.gather` instead of a transfer_read for the "copy" on the input. (up for grabs)
## Graph level packing
- [ ] Do packing pre-processing to convert NCHW to NHWC convolutions and improve codegen for the generated pack/unpack ops. This tends to be required for good performance of implicit gemm as NCHW results in scalar non-contiguous loads for img2col. This is part of a larger effort for data-tiling + layout transformation. Not sure if this needs to live in IREE though ([hacked PoC pass](https://github.com/nod-ai/SHARK-Runtime/commit/4d15f318ed32e67852637998353fa81f27eadb71))
## SPIR-V Specific Tasks
- [x] `vector.gather` unrolling in SPIRVVectorize (@qedawkins)
- [ ] https://github.com/openxla/iree/pull/13736
- [ ] For unaligned, lowering patterns for masked vectors (to scalarized loads wrapped in scf.if, similar to the gather lowering pattern)  (composite/up for grabs)
- [ ] Handle flattening for memref.dealloc or turn off deallocation generation in bufferization
## Blocked/data tiled convolutions
- [ ] #13415
- [ ] Generalize img2col to arbitrary "convolution-like" ops (currently restricted to non-depthwise 2d named convolutions) (@qedawkins)
qedawkins commented 1 year ago

I updated some of the tasks here. A few of the bullets here are already mostly done, just need some cleanup + tests (e.g. gather unrolling for SPIR-V). Will link pull requests for them once complete.

mattwalsh commented 1 year ago

Thanks Quinn! These are great, and I look forward to decomposing these to tasks we can move on

qedawkins commented 1 year ago

For sure; I don't know how to assign items to myself (without creating issues or PRs) and some of the tasks are requisite for later ones. I reordered the tasks a bit and marked the ones I don't have planned as up for grabs in the meantime.

allieculp commented 1 year ago

@qedawkins I think you can just add your name to items (without them being real issues). I did this above for 'Enable software pipelining' as a sample - please edit!