google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.12k stars 315 forks source link

Different test results using Clang when enabling Debug or not on targer RVV #2140

Closed wychlw closed 1 month ago

wychlw commented 5 months ago

When using clang@d70267fb with highway@e9a2799, cmake with -DCMAKE_BUILD_TYPE=Debug option gives different test results between without this option.

Test results without Debug option:

99% tests passed, 4 tests failed out of 684

Total Test time (real) =   7.99 sec

The following tests FAILED:
        249 - HwyDemoteTestGroup/HwyDemoteTest.TestAllDemoteToFloat/RVV  # GetParam() = 137438953472 (Failed)
        283 - HwyFloatTestGroup/HwyFloatTest.TestAllCeil/RVV  # GetParam() = 137438953472 (Failed)
        285 - HwyFloatTestGroup/HwyFloatTest.TestAllFloor/RVV  # GetParam() = 137438953472 (Failed)
        651 - SortTestGroup/SortTest.TestAllPartition/RVV  # GetParam() = 137438953472 `(Failed)`
Errors while running CTest

Test results with Debug option:

98% tests passed, 12 tests failed out of 684

Total Test time (real) =  25.01 sec

The following tests FAILED:
        249 - HwyDemoteTestGroup/HwyDemoteTest.TestAllDemoteToFloat/RVV  # GetParam() = 137438953472 (Failed)
        571 - MatVecTestGroup/MatVecTest.TestAllMatVecBF16/RVV  # GetParam() = 137438953472 (Failed)
        645 - SortTestGroup/SortTest.TestAllFloatInf/RVV  # GetParam() = 137438953472 (Failed)
        651 - SortTestGroup/SortTest.TestAllPartition/RVV  # GetParam() = 137438953472 (Failed)
        655 - SortTestGroup/SortTest.TestAllSort/RVV  # GetParam() = 137438953472 (Failed)
        656 - SortTestGroup/SortTest.TestAllSort/EMU128  # GetParam() = 2305843009213693952 (Failed)
        657 - SortTestGroup/SortTest.TestAllSelect/RVV  # GetParam() = 137438953472 (Failed)
        658 - SortTestGroup/SortTest.TestAllSelect/EMU128  # GetParam() = 2305843009213693952 (Failed)
        659 - SortTestGroup/SortTest.TestAllPartialSort/RVV  # GetParam() = 137438953472 (Failed)
        660 - SortTestGroup/SortTest.TestAllPartialSort/EMU128  # GetParam() = 2305843009213693952 (Failed)
        663 - BenchSortGroup/BenchSort.BenchAllSort/RVV  # GetParam() = 137438953472 (Failed)
        664 - BenchSortGroup/BenchSort.BenchAllSort/EMU128  # GetParam() = 2305843009213693952 (Failed)
Errors while running CTest

When digging into a more sipecific task, MatVecTest.TestAllMatVecBF16/RVV, on line: https://github.com/google/highway/blob/4852c6f356fb678a0e6af11151b25981278fa1c6/hwy/contrib/matvec/matvec_test.cc#L171-L174 With Debug, the actual would be -1.993652, resulting a negtive tolerance. But without Debug, all data would be positive so the test is fine.

i16/f32 6 x 8, with add: mismatch at 4 -1.993652 -1.993652; tol -0.311508

And in SortTest, num would be 24 and Constants::SampleLanes<T>() would be 32.

Abort at vqsort-inl.h:1208: Assert num >= Constants::SampleLanes<T>()
jan-wassenberg commented 4 months ago

Thanks for reporting. We have also seen issues with rounding mode on QEMU - is that how you are running the tests, or is it on real HW?

Debug, the actual would be -1.993652

Interesting. GenerateMod does, or should, generate numbers 0..15. Can you help us understand where the negative numbers come from? Would be good to also add an assert that inputs and outputs are non-negative.

For SortTest, the comment there says: "We have at least 2 chunks (x 64 bytes) because the base case handles anything up to 8 vectors (x 16 bytes)." It seems possible that this is breaking with LMUL<1. This is only 'breaking' in debug mode because it's a DASSERT which is only active in debug builds. Can you print N and d.Pow2() at the failing DASSERT?

johnplatts commented 4 months ago

There were bugs in RVV F64->F32 and F32->F16 DemoteTo, which are fixed in pull request #2164.

RVV Ceil and Floor have also been reimplemented in pull request #2164 to avoid changing the floating point rounding mode using inline assembly, which fixes issues with Ceil and Floor on RVV on Clang 16 and later.

jan-wassenberg commented 1 month ago

I think the issue is solved, thanks @johnplatts :)