Open mbanck opened 2 years ago
Hrm, I've found #240 now which is related - in that you mentioned that this test case is bound to fail, but it sounds like due to numerical noise, not dsyev/pdsyevd
? Is this a separate issue?
The Ubuntu testsuite results in that other issue are not very helpful, it just says FAILED
. I've changed the testsuite script to dump the last 50 lines of output for failed test cases now.
Hi Michael, sorry for the very late response. This test is supposed to converge to incorrect results and is not supposed to throw errors. I am not exactly sure what this is without reproducing myself. Thanks for reporting.
It seems to be flakey - the test ran fine again on the next upload.
Not sure whether this can be tracked down definitively - I downgraded the corresponding Debian bug, but that's not really an option for Github.
I'll run another test build overnight and see what the current status is on my personal box.
New information from the Debian bug:
it seems related to the host that runs the
test. I.e. the test fails on our beefy amd64 host (ci-worker13) with 64
cores and 256GB RAM, but seems to pass on the others.
The error on s390x is the same by the way (that has 10 cores and 32GB RAM).
So two things seem to work-around this:
BAGEL_NUM_THREAD=4
(it fails with 8 or 16)
First reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006788
If I revert MPICH to 3.4.1, the testcase runs fine. If I use MPICH 4.0.1, it fails with