nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
69 stars 30 forks source link

[Strix][Ukernel][ObjectFifo] Add Strix ukernel support #879

Closed Abhishek-Varma closed 1 week ago

Abhishek-Varma commented 2 weeks ago

-- This commit replaces mm.cc ukernel file with two separate files now : mm_npu1.cc for Phoenix and mm_npu4.cc for Strix. Therefore during xclbin's generation, based on the target device, the corresponding ukernel file is compiled. -- It also makes changes to lower-to-ukernels pass to adapt to different target device.

Signed-off-by: Abhishek Varma abhvarma@amd.com

Abhishek-Varma commented 2 weeks ago

I have tried this on 64x64x64 bf16->f32 = it works with correct results - here is the e2e IR

I have tried this on 128x128x256 bf16 -> f32 = there are numerical mismatches in some places whilst most elements are correct - here is the e2e IR (Same case for 1024x1024x1024) I speculated that this might be because of bf16 -> bfp16 but then this should've been the case for all values, but it's happening for only some of them - perhaps some tiling issue?!

NOTE: I have added the e2e test of 64x64x64 in run_matmul_test.sh for now for ease of adding the tests - it can be ported to run.py later.

jtuyls commented 2 weeks ago

I speculated that this might be because of bf16 -> bfp16 but then this should've been the case for all values, but it's happening for only some of them - perhaps some tiling issue?!

Also, the values seem off quite a bit, which wouldn't be expected from bf16 -> bfp16, so, there is definitely something wrong here.

newling commented 1 week ago

For strix, wouldn't a 'simpler' matmul to test be i8->i32? Because of the complexities around bf16 -> blf16 -> bf16 conversion.

newling commented 1 week ago

@Abhishek-Varma I can't see anything obviously wrong/incompatible with the ukernel which would cause numerical error for 128x128 (but pass for 64x64), so I think trying it through peano might be a good idea.

jtuyls commented 1 week ago

@Abhishek-Varma For me we can merge this for the shape that works after rebase + fixing the print statements

newling commented 1 week ago

@Abhishek-Varma For me we can merge this for the shape that works after rebase + fixing the print statements

Fine by me too. Please add a comment where the 64x64x64 test is created in run.py that failure is expected for the 128x128 case, and that we don't yet understand why.