Closed embray closed 4 years ago
Thanks for looking into this.
Sadly, I don't have access to any windows system, so I never tested Sirocco there.
As you say, Sirocco does numerical aproximations of the movements of roots of a polynomial, but carefully testing that the step is small enough to guarantee the correctness. So it is indeed very sensitive to all the small subtleties of floating point computing.
For example, we discovered when we moved to autotools that the optimization flag -O2 can actually go against performance (and maybe even correctness) , so we had to make sure that it was compiled with no optimization options. Maybe the problem you are detecting on windows is related to that?
I just tested under a linux box disabling the tweek we did to prevent the compiler optimization and I get the same error as you point, so it really looks that is the cuplrit.
In particular, the only line I had to modify is this one:
: ${CXXFLAGS=-O0 -g}
In the configure.ac
file [1]
Maybe that kind of autotools behaviour is platform-dependent?
[1] https://github.com/miguelmarco/SIROCCO2/blob/master/configure.ac#L4
P.S. I noticed that you forked the old sirocco repo. The one we are using here is the version 2 one: https://github.com/miguelmarco/SIROCCO2
That's for the note about -O0
. I have another ticket #29149 where I'm fixing building sirocco on Cygwin (though I haven't posted the patch yet). In particular I had to remove CXXFLAGS=-O0
to even get it to build, because otherwise I had problems during linking with some templates; I'm not exactly sure why but it seems like they were being overspecialized, in such a way that resulted in multiple definitions for some functions. Removing -O0
fixed it because then the compiler would optimize out unused specializations, but there is probably a better solution.
As you say, this is likely the culprit.
Replying to @miguelmarco:
P.S. I noticed that you forked the old sirocco repo. The one we are using here is the version 2 one: https://github.com/miguelmarco/SIROCCO2
I see. When I noticed that my copy was not 2.0 I updated my fork of the repo to 2.0 as well, so I can confirm that I am building the correct version at least.
Replying to @miguelmarco:
For example, we discovered when we moved to autotools that the optimization flag -O2 can actually go against performance (and maybe even correctness) , so we had to make sure that it was compiled with no optimization options. Maybe the problem you are detecting on windows is related to that?
Did you ever figure out what specific optimizations were hurting performance and/or correctness, and where those optimizations were occurring? Because disabling all optimizations is a blunt hammer and probably hurts performance in other cases. I know figuring that out can be tricky of course, but I have to ask.
I was able to reproduce this on Linux also by removing -O0
and setting -O2
instead.
Description changed:
---
+++
@@ -1,6 +1,8 @@
-#29149 fixes building Sage with the sirocco package installed on Cygwin.
+#29149 fixes building Sage with the sirocco package installed on Cygwin. This required, for linking purposes, removing the forced `-O0` flag (this could probably be worked around by other means--building with optimizations disabled *should* be possible).
-However, this then has some test failures in `sage.schemes.curves.zariski_vankampen` that I can't reproduce on Linux. First the following test fails:
+However, as mmarco remarked below, there are some optimizations that cause numerical errors in sirocco such that it fails to converge or takes a longer path than necessary. This *can* be reproduced on Linux (although it was previously found on Cygwin).
+
+In particular this causes test failures in `sage.schemes.curves.zariski_vankampen`. First the following test fails:
File "src/sage/schemes/curves/zariski_vankampen.py", line 268, in sage.schemes.curves.zariski_vankampen.followstrand @@ -45,3 +47,5 @@ which on Linux passes very quickly. However, the CPU is still very active during this hang so it must be some kind of busy loop.
I don't understand anything about this package except that it's using some kind of linear approximation techniques, so there might be some small numerical glitches being invoked that are causing some solutions to diverge or something. + +It might be possible to work around this by identifying exactly which optimizations result in the problem, rather than disabling all optimizations.
I can also reproduce the problem enabling basic optimizations with -O1
, so it could be any combination of
-fauto-inc-dec
-fbranch-count-reg
-fcombine-stack-adjustments
-fcompare-elim
-fcprop-registers
-fdce
-fdefer-pop
-fdelayed-branch
-fdse
-fforward-propagate
-fguess-branch-probability
-fif-conversion
-fif-conversion2
-finline-functions-called-once
-fipa-profile
-fipa-pure-const
-fipa-reference
-fipa-reference-addressable
-fmerge-constants
-fmove-loop-invariants
-fomit-frame-pointer
-freorder-blocks
-fshrink-wrap
-fshrink-wrap-separate
-fsplit-wide-types
-fssa-backprop
-fssa-phiopt
-ftree-bit-ccp
-ftree-ccp
-ftree-ch
-ftree-coalesce-vars
-ftree-copy-prop
-ftree-dce
-ftree-dominator-opts
-ftree-dse
-ftree-forwprop
-ftree-fre
-ftree-phiprop
-ftree-pta
-ftree-scev-cprop
-ftree-sink
-ftree-slsr
-ftree-sra
-ftree-ter
-funit-at-a-time
I didn't dig into the details of what specific optimization was the problem. Sorry.
One problem I found (if not the problem) is that although there is a template specialization for evaluation polynomials on complex intervals:
template <>
IComplex Polynomial<IComplex>::operator () (const IComplex &x, const IComplex &y) const{
IComplex rop, ropx, ropy;
#ifdef DEVELOPER
std::cerr << "using specialized polynomial eval for " << __PRETTY_FUNCTION__ << std::endl;
#endif
ropx = this->evalIPolHornerXY (x, y);
ropy = this->evalIPolHornerYX (x, y);
rop = this->evalPolClassic (x, y);
rop.r.a = MAX(MAX(ropx.r.a, ropy.r.a), rop.r.a);
rop.r.b = MIN(MIN(ropx.r.b, ropy.r.b), rop.r.b);
rop.i.a = MAX(MAX(ropx.i.a, ropy.i.a), rop.i.a);
rop.i.b = MIN(MIN(ropx.i.b, ropy.i.b), rop.i.b);
return rop;
}
when building with -O1
that specialization seems to be being ignored, and instead it's using the unspecialized template:
template <class T>
T Polynomial<T>::operator() (const T &x, const T &y) const {
#ifdef DEVELOPER
std::cerr << "using unspecialized polynomial eval for " << __PRETTY_FUNCTION__ << std::endl;
#endif
return this->evalPolClassic (x,y);
}
I don't know why that would be but I'll keep looking...
This seems to fix it for me:
diff --git a/include/polynomial.hpp b/include/polynomial.hpp
index 88d1299..4596579 100644
--- a/include/polynomial.hpp
+++ b/include/polynomial.hpp
@@ -230,6 +230,10 @@ T Polynomial<T>::operator() (const T &x, const T &y) const {
return this->evalPolClassic (x,y);
}
+// Defined in lib/polynomial.cpp
+template <> IComplex Polynomial<IComplex>::operator() (const IComplex &x, const IComplex &y) const;
+template <> MPIComplex Polynomial<MPIComplex>::operator () (const MPIComplex &x, const MPIComplex &y) const;
+
template <class T>
T Polynomial<T>::diffX (const T &x, const T &y) const {
// coef[(i*(i+1))/2 + j] is coeficient of monomial of degree 'i',
@@ -311,6 +315,10 @@ T Polynomial<T>::diffY (const T &x, const T &y) const {
return this->evalPolYClassic (x,y);
}
+// Defined in lib/polynomial.cpp
+template<> IComplex Polynomial<IComplex>::diffY (const IComplex &x, const IComplex &y) const;
+template<> MPIComplex Polynomial<MPIComplex>::diffY (const MPIComplex &x, const MPIComplex &y) const;
+
template <class T>
T Polynomial<T>::diffXX (const T &x, const T &y) const {
The problem is that sirocco.cpp includes "polynomial.hpp" which only has the unspecialized operator()
defined inline, but no declaration for the specialized version, so it just use the unspecialized one. Meanwhile, when compiling polynomial.cpp
the unused specializations are just optimized out.
This would possibly explain the link errors I was having on Cygwin as well--I don't know why the problem only occurs on Windows but it must be generating an explicit specialization for Polynomial<IComplex>::operator()
in sirocco.o
and then conflicting with the one in polynomial.o
. Going to test that theory out now.
Yep, this is exactly the link error that I got which motivated me to try removing the -O0
. This explains it exactly:
libtool: link: g++ -std=gnu++11 -std=gnu++11 -shared -nostdlib /usr/lib/gcc/x86_64-pc-cygwin/7.4.0/crtbeginS.o .libs/icomplex.o .libs/interval.o .libs/mp_complex.o .libs/mp_icomplex.o .libs/mp_interval.o .libs/polynomial.o .libs/sirocco.o -lmpfr -lgmp -L/home/embray/src/sagemath/sage/local/lib -L/home/embray/src/sagemath/sage/local/lib/../lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0 -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../../../x86_64-pc-cygwin/lib/../lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../../../x86_64-pc-cygwin/lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../.. -lstdc++ -lgcc_s -lgcc -lcygwin -ladvapi32 -lshell32 -luser32 -lkernel32 -lgcc_s -lgcc /usr/lib/gcc/x86_64-pc-cygwin/7.4.0/crtend.o -O0 -Wl,-rpath -Wl,/home/embray/src/sagemath/sage/local/lib -o .libs/cygsirocco-0.dll -Wl,--enable-auto-image-base -Xlinker --out-implib -Xlinker .libs/libsirocco.dll.a
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI8IComplexEclERKS0_S3_[_ZNK10PolynomialI8IComplexEclERKS0_S3_]+0x0): multiple definition of `Polynomial<IComplex>::operator()(IComplex const&, IComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0x642): first defined here
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI8IComplexE5diffYERKS0_S3_[_ZNK10PolynomialI8IComplexE5diffYERKS0_S3_]+0x0): multiple definition of `Polynomial<IComplex>::diffY(IComplex const&, IComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0xfbe): first defined here
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI10MPIComplexEclERKS0_S3_[_ZNK10PolynomialI10MPIComplexEclERKS0_S3_]+0x0): multiple definition of `Polynomial<MPIComplex>::operator()(MPIComplex const&, MPIComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0x188a): first defined here
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI10MPIComplexE5diffYERKS0_S3_[_ZNK10PolynomialI10MPIComplexE5diffYERKS0_S3_]+0x0): multiple definition of `Polynomial<MPIComplex>::diffY(MPIComplex const&, MPIComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0x22f8): first defined here
collect2: error: ld returned 1 exit status
Author: Erik Bray
New commits:
35d7703 | Trac #29167: Fix incorrect results that occur when sirocco is build with |
Description changed:
---
+++
@@ -49,3 +49,5 @@
I don't understand anything about this package except that it's using some kind of linear approximation techniques, so there might be some small numerical glitches being invoked that are causing some solutions to diverge or something.
It might be possible to work around this by identifying exactly which optimizations result in the problem, rather than disabling all optimizations.
+
+*Upstream PR:* https://github.com/miguelmarco/SIROCCO2/pull/3
Branch: u/embray/ticket-29167
Upstream: Reported upstream. Developers acknowledge bug.
Thanks a lot. Great work.
I might need some time to review and test it though.
Did you check that all test pass with this patch?
Also, what would you consider the best way to go: just patch sirocco at install time inside Sage, or release a new version of Sirocco and using that in Sage?
Replying to @miguelmarco:
Thanks a lot. Great work.
I might need some time to review and test it though.
Thank you for looking at it! There's no hurry.
Did you check that all test pass with this patch?
Per our discussion on GitHub, yes. I'm just reiterating here for the record.
Also, what would you consider the best way to go: just patch sirocco at install time inside Sage, or release a new version of Sirocco and using that in Sage?
It's up to you. There's no urgency on this either way, so whatever's easiest for you.
Changed upstream from Reported upstream. Developers acknowledge bug. to Fixed upstream, but not in a stable release.
I just created a new release in github. The source code is available at https://github.com/miguelmarco/SIROCCO2/releases/download/v2.0.1/libsirocco-2.0.1.tar.gz
Oh shoot, before you made the release I should have pointed out to you the fixes I made for Cygwin. I forgot to make a PR for them yet. I will do that now...
No problem, I can make another release (that is the good thing about numbers: we won't run out of them).
I made a new release on github.
The tarball is available at https://github.com/miguelmarco/SIROCCO2/releases/download/2.0.2/libsirocco-2.0.2.tar.gz
Erik, are you going to rebase this ticket based on the new release?
Yes, I've been busy with other things, but I plan to replace this ticket with one to update the sirocco spkg to the new release.
Superseded by #29284 which upgrades the SIROCCO SPKG to the new 2.0.2 release, rather than patching.
29149 fixes building Sage with the sirocco package installed on Cygwin. This required, for linking purposes, removing the forced
-O0
flag (this could probably be worked around by other means--building with optimizations disabled should be possible).However, as mmarco remarked below, there are some optimizations that cause numerical errors in sirocco such that it fails to converge or takes a longer path than necessary. This can be reproduced on Linux (although it was previously found on Cygwin).
In particular this causes test failures in
sage.schemes.curves.zariski_vankampen
. First the following test fails:The first an last lines are correct, but none of the stuff in the middle.
Then a bit later it hangs, seemingly indefinitely at
which on Linux passes very quickly. However, the CPU is still very active during this hang so it must be some kind of busy loop.
I don't understand anything about this package except that it's using some kind of linear approximation techniques, so there might be some small numerical glitches being invoked that are causing some solutions to diverge or something.
It might be possible to work around this by identifying exactly which optimizations result in the problem, rather than disabling all optimizations.
Upstream PR: https://github.com/miguelmarco/SIROCCO2/pull/3
Upstream: Fixed upstream, but not in a stable release.
CC: @miguelmarco
Component: algebraic topology
Keywords: cygwin sirocco
Author: Erik Bray
Branch/Commit: u/embray/ticket-29167 @
35d7703
Issue created by migration from https://trac.sagemath.org/ticket/29167