Closed wychlw closed 1 month ago
Thanks for reporting. We have also seen issues with rounding mode on QEMU - is that how you are running the tests, or is it on real HW?
Debug, the actual would be -1.993652
Interesting. GenerateMod does, or should, generate numbers 0..15. Can you help us understand where the negative numbers come from? Would be good to also add an assert that inputs and outputs are non-negative.
For SortTest, the comment there says: "We have at least 2 chunks (x 64 bytes) because the base case handles anything up to 8 vectors (x 16 bytes)." It seems possible that this is breaking with LMUL<1. This is only 'breaking' in debug mode because it's a DASSERT which is only active in debug builds. Can you print N and d.Pow2() at the failing DASSERT?
There were bugs in RVV F64->F32 and F32->F16 DemoteTo, which are fixed in pull request #2164.
RVV Ceil and Floor have also been reimplemented in pull request #2164 to avoid changing the floating point rounding mode using inline assembly, which fixes issues with Ceil and Floor on RVV on Clang 16 and later.
I think the issue is solved, thanks @johnplatts :)
When using clang@d70267fb with highway@e9a2799, cmake with
-DCMAKE_BUILD_TYPE=Debug
option gives different test results between without this option.Test results without
Debug
option:Test results with
Debug
option:When digging into a more sipecific task,
MatVecTest.TestAllMatVecBF16/RVV
, on line: https://github.com/google/highway/blob/4852c6f356fb678a0e6af11151b25981278fa1c6/hwy/contrib/matvec/matvec_test.cc#L171-L174 WithDebug
, theactual
would be-1.993652
, resulting a negtive tolerance. But withoutDebug
, all data would be positive so the test is fine.And in
SortTest
,num
would be 24 andConstants::SampleLanes<T>()
would be 32.