nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
69 stars 30 forks source link

[CI] Support bf16 output in end-to-end numerical testing #829

Closed newling closed 1 month ago

newling commented 1 month ago

See comments in https://github.com/nod-ai/iree-amd-aie/pull/822

@Abhishek-Varma this should unblock you (although I haven't tested your PR with this yet...)

Abhishek-Varma commented 1 month ago

See comments in #822

@Abhishek-Varma this should unblock you (although I haven't tested your PR with this yet...)

Thank you so much @newling for this prompt fix - I tried this and it works!

Well, by "works" I mean I was able to get past the issue which triggered this fix, but the comparison of the values itself failed in my case.

- baseline values are in the range [316.0, 632.0]
- AIE values are in the range [316.0, 628.0]

This shouldn't have to do anything with the element type conversion, right?

Discrepencies at first 14 indices:

 Index | Baseline                | AIE
-------+-------------------------+----
[[ 0]  | 632.0                   | 628.0
 [ 1]  | 516.0                   | 512.0
 [ 2]  | 552.0                   | 548.0
 [ 6]  | 476.0                   | 474.0
 [ 8]  | 520.0                   | 516.0
 [11]  | 548.0                   | 544.0
 [13]  | 556.0                   | 552.0
 [21]  | 552.0                   | 548.0
 [23]  | 532.0                   | 528.0
 [24]  | 492.0                   | 490.0
 [25]  | 584.0                   | 580.0
 [26]  | 512.0                   | 510.0
 [29]  | 552.0                   | 548.0
 [31]] | 528.0                   | 524.0
-------+-------------------------+----

Most of the values differ by either 4 or 2.

newling commented 1 month ago

See comments in #822 @Abhishek-Varma this should unblock you (although I haven't tested your PR with this yet...)

Thank you so much @newling for this prompt fix - I tried this and it works!

Well, by "works" I mean I was able to get past the issue which triggered this fix, but the comparison of the values itself failed in my case.

- baseline values are in the range [316.0, 632.0]
- AIE values are in the range [316.0, 628.0]

This shouldn't have to do anything with the element type conversion, right?

Discrepencies at first 14 indices:

 Index | Baseline                | AIE
-------+-------------------------+----
[[ 0]  | 632.0                   | 628.0
 [ 1]  | 516.0                   | 512.0
 [ 2]  | 552.0                   | 548.0
 [ 6]  | 476.0                   | 474.0
 [ 8]  | 520.0                   | 516.0
 [11]  | 548.0                   | 544.0
 [13]  | 556.0                   | 552.0
 [21]  | 552.0                   | 548.0
 [23]  | 532.0                   | 528.0
 [24]  | 492.0                   | 490.0
 [25]  | 584.0                   | 580.0
 [26]  | 512.0                   | 510.0
 [29]  | 552.0                   | 548.0
 [31]] | 528.0                   | 524.0
-------+-------------------------+----

Most of the values differ by either 4 or 2.

When I was debugging convolution numerics, I played around with the input values here:

https://github.com/nod-ai/iree-amd-aie/blob/0757023b292b998e3666bc47a5862d1f16da0665/build_tools/ci/cpu_comparison/input_generator.py#L152

and here:

https://github.com/nod-ai/iree-amd-aie/blob/0757023b292b998e3666bc47a5862d1f16da0665/build_tools/ci/cpu_comparison/input_generator.py#L67

For example, if you just make the just make the first element of A and B 1, and all other values zero, then you know what to expect...

newling commented 1 month ago

Also, what's the smallest integer that cannot be represented as bfloat16, is it 128? I'm wondering now if the discrepency is just that CPU backend doesn't actually do anything in bf16 (maybe?)

newling commented 1 month ago

AIE is rounding down, llvm-cpu is rounding up. I did a test where the fp32 output is 527, llvm-cpu rounds to 528 and aie rounds to 524.

This is true for any setting of rounding_mode!

%3 = arith.truncf %2 to_nearest_even 
%3 = arith.truncf %2 downward
%3 = arith.truncf %2 upward

etc llvm-cpu always rounds up AIE (peano) always rounds down.

What I think we should do: write a test which doesn't rely on llvm-cpu.

I'll try and set this up.

newling commented 1 month ago

I would like to land this, before helping on the truncf PR with @Abhishek-Varma . Can I please get a complete review?

makslevental commented 1 month ago

randomly found this just now https://github.com/iree-org/iree/blob/e2a2b2b52de077db5daf63bd8c9255d6f3be2036/compiler/src/iree/compiler/Codegen/Common/ConvertBf16ToUInt16Buffers.cpp - not sure if it's useful :shrug:

newling commented 1 month ago

Cool, maybe. We don't do any casting in C++ atm. I came across this this morning https://github.com/jax-ml/ml_dtypes so I should probably use that instead of manually bit slicing.

makslevental commented 1 month ago

Cool, maybe. We don't do any casting in C++ atm. I came across this this morning https://github.com/jax-ml/ml_dtypes so I should probably use that instead of manually bit slicing.

I already added that https://github.com/nod-ai/iree-amd-aie/blob/main/tests/requirements.txt#L6