Open dlaugt opened 6 years ago
The log indicates the division code is failing. Given that the other tests pass, there's likely not a configuration issue. My guess would be faulty hardware or compiler bug. I don't think there is much we can do about it. We have no Solaris experts in the project any more.
I've just compiled mpir 2.7.2 on the same Solaris machine and on the same compiler (gcc 4.8.1). All tests have passed with this version. This seems to be caused by a change on the code of mpir 3.0.0...
One more indication. On Linux with the same compiler (gcc 4.8.1), all tests have passed.
CPU is a SPARC-T4:
$ psrinfo -pv
The physical processor has 64 virtual processors (0-63)
SPARC-T4 (chipid 0, clock 2848 MHz)
The physical processor has 64 virtual processors (64-127)
SPARC-T4 (chipid 1, clock 2848 MHz)
Sure, the division code changed in MPIR 3.0.0. I still think it's most likely to be a miscompilation of that new code (or a hardware fault triggered by that new code).
The regression is coming from the following change: #193
All tests are passing on Solaris when I replace mpn/generic/mulmid_basecase.c with the one in 2.7.2. I suspect a bug on this code when mpn_addmul_2, mpn_addmul_3, mpn_addmul_4, mpn_addmul_5 or mpn_addmul_6 are native.
Anyway, from my side, I will take mulmid_basecase.c of 2.7.2 which is in synch with the master branch of GMP.
The issue is probably in this commit then:
https://github.com/wbhart/mpir/pull/193/commits/b78c923dcb66773508cd98c120ec4744fc92abdb
But the code is directly from GMP. If the bug exists in MPIR it probably exists there too. Or, as I say, it's just a miscompilation.
I'm not sure there is any special mpn_addmul2/3/4/5 assembly code. So, short of a bug in the code in that commit, I can't really think of anything else. I suppose it is possible that the mpn_addmul2 generic code in MPIR makes different assumptions to the code in MPIR. But that's a stretch. Lots of other things would be expected to break, if so.
By the way, what is the result of ./config.guess on your machine, so we can rule out an issue with assembly code.
Which gmp repo are you referring to? In this repo, I don't see the optimization with mpn_addmul2/3/4/5/6 in mpn/generic/mulmid_basecase.c. The result of ./config.guess is ultrasparc-sun-solaris2.10.
Ok, so I didn't write this code, and don't know where that addmul2/3/4/5/6 comes from. I presume it has been moved from somewhere else in the GMP repository into this file as we have a slightly different arrangement of files.
This was all part of a speedup of the powm function, from memory, which had been done in GMP, so we moved over the improvements to MPIR. Alternatively, the changes might have been taken from one of our other files, e.g. one of the plain multiplication functions.
I don't know to what extent the addmul2 code is critical to get these performance improvements. We have an addmul2 in the ultrasparc directory, so it's understandable why this might be being picked up on your machine. Perhaps it is faulty.
We do have addmul2 on a number of other architectures, so it is understandable why the person who did this work might have wanted to make use of it. On the machines where it is available, it is faster than addmul1.
Simply removing that code might make it work, but it will also slow everything down on other arches. So unfortunately we need to get to the bottom of the failure before fixing it.
I've upgraded mpir from 2.1.1 to 3.0.0 on our compilation servers (Linux and Solaris). All tests passed on Linux. However, on Solaris, some tests are failing:
I don't know how to dig into those errors. Any suggestions? test-suite.log