nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
69 stars 30 forks source link

Remove load alignment removal pass #825

Closed newling closed 1 month ago

newling commented 1 month ago

With the removal of this pass, the optimized vectorized IR for conv2d (see https://github.com/nod-ai/iree-amd-aie/issues/820 file vectorized_input.opt.ll) changes from

  %9 = load <32 x bfloat>, ptr getelementptr inbounds ([144 x bfloat], ptr @buff_6, i20 0, i20 8), align 64

to

  %9 = load <32 x bfloat>, ptr getelementptr inbounds ([144 x bfloat], ptr @buff_6, i20 0, i20 8), align 2

this later IR, to my llvm-novice brain, looks much more sensible (how you the aligment be 64 when the offset from the pointer is 8??)

Also, this makes conv numerics pass :D

TMI but this being the bug also explains why the issue seemed to be with the offset in the input image data, not the kernel data: The kernel is read with a stride of 32 bfloats (64 bytes) so the alignment of 64 was correct in this case. The input image data is read with stride of 8 bfloats (16 bytes) so assuming alignment of 64 bytes meant reading the wrong data from the image patch. I also observed the image data was being offset by not enough, which also agrees with the bug.

newling commented 1 month ago

Seems like the pass is still needed for matmul.