iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.82k stars 608 forks source link

tosa.rescale calls roundevenf(), failing to compile some programs #12778

Open okkwon opened 1 year ago

okkwon commented 1 year ago

89a1af74@llvm-project introduces the use of math::RoundEvenOp which is lowered into a call to roundevenf() in libm, which causes an undefined symbol issue.

To work around the issue, the following tests in iree_tfl_tests are marked as XFAIL for now.

Additionally, several user programs are now failing to compile - see https://github.com/openxla/iree/issues/12836.

benvanik commented 1 year ago

that's real bad - that means we aren't vectorizing these at all - we could add the libm function as a workaround but someone should look into ensuring this gets vectorized and properly lowered.

rsuderman commented 1 year ago

I can look into the lowering. Its a little tricky where it should live but I can probably find a home for it. Implementation is the bigger question.

allieculp commented 1 year ago

@jpienaar P1 or no?

ScottTodd commented 1 year ago

Oh, this is a duplicate of https://github.com/openxla/iree/issues/12836, with a more descriptive issue title. I did some digging on the other issue. Which should we keep and which should we close?

ScottTodd commented 1 year ago

To work around the issue, the following tests in iree_tfl_tests are marked as XFAIL for now. mobilenet_v3 person_detect llvmcpu_mobilenet_v3-large_uint8 llvmcpu_posenet_i8 llvmcpu_resnet_50_int8

Oooh... that's why our CI was passing when downstream reported failures... yeah... that's really bad :/

ScottTodd commented 1 year ago

Definitely at least P1. We can't be regressing compilation correctness on real programs. I'd have advocated for blocking the LLVM integrate on that and not disabling tests. This is not a good state to be in.

allieculp commented 1 year ago

@rsuderman Let us know if you are looking into this? Marked as P1, can update the status as you see fit.

jpienaar commented 1 year ago

I believe there was another bug related here that Rob had marked as P0.

rsuderman commented 1 year ago

This should all be bucketed together with the work in https://github.com/openxla/iree/issues/12783

No one lowering there is that bad, we just need someone to dedicate a few weeks to implementing each lowering and doesn't mind learning floating point manipulations.

ScottTodd commented 1 year ago

Is there an acceptable workaround in the meantime? It doesn't look practical to ask for an upstream revert, considering how the break is indirect (introduces libm calls which most LLVM/MLIR users are okay with but we are not). Taking O(1 month) to address a regression in compilation coverage with currently unprioritized feature work is not a great look :/

Could we (for example) add a pattern that switches back to the old behavior? Or one that lowers roundeven() to round()?

ScottTodd commented 1 year ago

If we do move forward like this (no workaround, wait on the "right fix"), I'd at least want to see some notes show up in any affected stable releases that this is a known issue.

rsuderman commented 1 year ago

If you want the fastest "unbreak" approach your best approach is to add a lowering in iree that converts roundeven to round, then we would need to be certain we fix the issue properly. Reverting the upstream patch is not valid as this is an issue with IREE not having a lowering path for roundeven. This change corresponds to an actual change in the TOSA spec, so it would invert the expectations on supporting tosa.

allieculp commented 1 year ago

@ramiro050 is this in progress?

ramiro050 commented 1 year ago

@ramiro050 is this in progress?

Yes, this is in progress.

ramiro050 commented 1 year ago

The expand pattern for roundeven has landed: https://github.com/llvm/llvm-project/commit/8d2bae9abdc30e104bab00a4dd0f9d39f5bdda6e

jpienaar commented 1 year ago

Rollback upstream due to failure on M2, is it something you can reproduce or would need help finding place to run it?

ramiro050 commented 1 year ago

Rollback upstream due to failure on M2, is it something you can reproduce or would need help finding place to run it?

I can try reproducing on my M1 Macbook. Although the issue seems to be that it is perfectly valid to print a negative nan value as nan without the sign.

For reference, rollback of rollback: https://reviews.llvm.org/D148941

ramiro050 commented 1 year ago

Issues leading to rollback of roundeven patch have been fixed: https://github.com/llvm/llvm-project/commit/44baa65589683dd441594d74625bb0b3ac265dc8

jpienaar commented 1 year ago

This LLVM version has now been integrated, is next step adding that populate method in?

jpienaar commented 1 year ago

llvmcpu_posenet_i8.run fails with numerical difference.