Closed newling closed 2 weeks ago
I've narrowed it down to the alignment of the loads : if the alignments are set/forced sufficiently low, then the numerics with vectorization are fine. I am a bit perplexed why the alignment needs to be as low as it does, consider the 2 IRs in the zip file: opts.zip
They are basically identical, except for some loads which have alignment 4 in the one file, and alignment 2 in the other. The case with alignment 2 gives numerically correct results, the one with alignment 4 does not. What I find confusing is that alignment 4 is surely enough here: none of the strides in any of the loads is less than 8.
Running the above 2 IRs through compiler explorer
http://xsjsda132:10240/z/6qc4cc
align 2: with_align_2.txt align 4: with_align_4.txt
congrats @newling for tracking this all the day down
Enabling vectorization (see https://github.com/nod-ai/iree-amd-aie/pull/789) for convolution results in numerical failure. The values are off only slightly (although they are definitely not correct, there is no floating point rounding issue here). Experiments with the input values suggests that the problem is that the input image data (i.e. not the kernel data) is being read incorrectly, with an incorrect offset inside the scf.for loop (not confirmed).
Some things I've tried:
Setting optimization flags in LLVM to 0 (the default is 2) has no effect. Inverting the scf.for loop order has no effect. Using different versions of peano has no effect.
This task is to find the source of the error, and if it's peano create a reproducer for the team.
Attached are the
ll
andopt.ll
files (vectorized and unvectorized) : ll_files.zip (they're quite small, vectorized_input.opt.ll is 94 lines only)The MLIR IR looks basically identical except for the inner-loop.
// With vectorization:
// Without vectorization:
In the vectorized case, the
vector.contract
gets lowered to aaievec.matmul
which in turn gets lowered to