Open lucas-camp opened 4 months ago
Based on the description, I'm going to start with the assumption that this is an issue with the torch to linalg conversions. Asking Rob to weigh in.
Based on the description, I'm going to start with the assumption that this is an issue with the torch to linalg conversions. Asking Rob to weigh in.
So the big problem is taking the tosa
path and linalg
path generate different linalg
named ops. Specifically tosa
inserts the transposes to support a NHWC ordering while torch
uses the NCHW case. Looking at the dispatches generated it appears to be fine?
Thanks for your response,
as the llvm-cpu
path leads to correct results and the GPU path only fails for specific parameter constellations and going from torch to linalg
directly, I could imagine that the compilation goes wrong for some parameter specialization (for example 3x3
kernels) at a lower level. If you need more inputs to look at, let me know.
What happened?
I have a PyTorch Conv2d module that is compiled with SHARK Turbine. Running the generated MLIR file through IREEs CUDA backend computes wrong results for specific combinations of input shapes, padding values and strides. It seems that the wrong results only appear if both input spatial dimensions are a multiple of 16, the ~strides~ paddings are a multiple of 16 + 1 (i.e. 1, 17, 33, ...) and the kernel size is 3.
Using Torch-MLIR with output type
LINALG_ON_TENSORS
results in the same wrong result. However using Torch-MLIR with output typeTOSA
produces correct results.I tested the CUDA code on a Tesla V100 (Driver 450, CUDA 12.3) and a MX150 (driver 535, CUDA12.2). It's worth to note that the CPU backend computes the correct result for all passes.
Steps to reproduce your issue
Compile the different MLIR inputs from https://gist.github.com/lucas-camp/da680f922ea958fbcbdf0eee79ebf523#file-conv2d_turbine-mlir with commands
iree-compile --iree-hal-target-backends=llvm-cpu INPUT.mlir -o OUTPUT.vmfb
for CPU andiree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_70 INPUT.mlir -o OUTPUT.vmfb
for CUDA. Run both modules and compare the ouputs for a random input of size1x1x16x16
.What component(s) does this issue relate to?
Compiler
Version information
IREE version 20240218.805
Additional context
FYI @marbre