nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
69 stars 30 forks source link

[Testing] Move some tests from run_matmul_test.sh to run.py #885

Open newling opened 2 weeks ago

newling commented 2 weeks ago

I've tested that the logic for approximate testing is correct (and like IREE's) but I should probably add a unit test. Tricky though, need to inject a failure. If people believe strongly we should/will just never test matmuls with M*N*K >1e11 I can simplify this and scrap the sampling logic.

newling commented 1 week ago

Damn I just another incomplete effort to unify the testing so I'm not sure what the best way to clean things up is now

https://github.com/nod-ai/iree-amd-aie/pull/795/

newling commented 1 week ago

I think I/we should try the following approach (slow burn, expect many delays)

1) move and eliminate run_matmul_test.sh by porting the tests to run.py 2) move and eliminate run.py by porting the tests to https://github.com/nod-ai/iree-amd-aie/pull/795

Why not move all directly to https://github.com/nod-ai/iree-amd-aie/pull/795 ? Because steps 1 and 2 are each smaller jumps and I think the metric to use for difficulty here is more l_inf than l_2

jtuyls commented 3 days ago

@newling Is this PR meant to be reviewed or is the port of all tests a WIP? I am not sure based on your latest comment.

newling commented 3 days ago

@newling Is this PR meant to be reviewed or is the port of all tests a WIP? I am not sure based on your latest comment.

I don't mind, if you have an opinion on the high-level plan feel free to post it here. So the question to answer is: Will moving the tests in run_matmul_test.sh to run.py improve the situation, and is it low/medium/high priority? I think "Yes" and "low".

Also, I'm going to remove the complexity of approximate testing in this PR. Until we actually run very large matmuls, it's not helping at all:

In [6]: %time np.matmul(np.random.rand(1024, 4096), np.random.rand(4096, 1024))
CPU times: user 417 ms, sys: 839 ms, total: 1.26 s
Wall time: 116 ms

which compared to compile time is insignificant.

jtuyls commented 2 days ago

I don't mind, if you have an opinion on the high-level plan feel free to post it here. So the question to answer is: Will moving the tests in run_matmul_test.sh to run.py improve the situation, and is it low/medium/high priority? I think "Yes" and "low".

I agree, we should move the tests to run.py first, and I also think it's low priority. But all new tests should be added to run.py imo, like we have been doing lately already.

Also, I'm going to remove the complexity of approximate testing in this PR. Until we actually run very large matmuls, it's not helping at all:

Yes, makes sense.