Open GoogleCodeExporter opened 9 years ago
Thanks Julian.
I am adding Keir and Julian to the thread. They may know about this.
I tried the code myself and it did not generate any problems. I think you are
right in suspecting that there is something about how macports is building this
library that is the problem.
Original comment by sameerag...@google.com
on 13 Aug 2014 at 4:20
I should have mentioned that the test program is quite specific; even changing
the "magic data" loaded into v1s and v2s makes the segfault disappear. Also,
removing the lower bound on variable 2 of each block--removing "Bounds(2,
-0.75)"--suppresses the segfault, suggesting that it might occur when that
constraint is active. (I didn't try removing the other bounds since they're
needed to prevent a divide by zero).
The segfault is usually, but not always, in cholmod_pack_factor called
indirectly from internal::SuiteSparse::Cholesky, though I suspect the bug
doesn't originate there, but rather with some other part of Ceres scribbling
over memory.
Original comment by julian.p...@gmail.com
on 13 Aug 2014 at 4:26
If you have MacPorts installed, you can see exactly how MacPorts builds
ceres-solver:
sudo port -dvs destroot ceres-solver
I have attached the output from my machine.
Original comment by Mark.M...@gmail.com
on 13 Aug 2014 at 5:02
1) The list of libraries you're linking when compiling test_ceres isn't
complete, at least on my machine CMake lists the following as SuiteSparse
dependencies that will be pulled in when Ceres is built statically:
libspqr
libcholmod
libccolamd
libcamd
libcolamd
libamd
libsuitesparseconfig
libmetis
libcxsparse
Is your cholmod dynamically linked to everything else?
2) When you're compiling Ceres manually via CMake and it's working, is
SuiteSparse definitely enabled?
3) What options are you compiling SuiteSparse with on MacPorts? The ports file
is pretty lengthy and includes an option to use ATLAS rather than the
Accelerate framework (which would not be being respected by Ceres unless you
explicitly told it to use a different BLAS library).
4) Is there any change if you remove the arguments: -g -O0 when compiling
test_ceres?
Original comment by alexs....@gmail.com
on 13 Aug 2014 at 5:16
1) Yes, my cholmod is linked against everything else:
$ otool -L broken
broken:
/opt/local/lib/libglog.0.dylib (compatibility version 1.0.0, current version 1.0.0)
/opt/local/lib/libgflags.2.dylib (compatibility version 4.0.0, current version 4.0.0)
/opt/local/lib/libcholmod.3.0.0.dylib (compatibility version 3.0.0, current version 3.0.0)
/opt/local/lib/libcxsparse.3.1.3.dylib (compatibility version 3.0.0, current version 3.1.3)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
$ otool -L /opt/local/lib/libcholmod.3.0.0.dylib
/opt/local/lib/libcholmod.3.0.0.dylib:
/opt/local/lib/libcholmod.3.0.0.dylib (compatibility version 3.0.0, current version 3.0.0)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
/opt/local/lib/libsuitesparseconfig.4.2.1.dylib (compatibility version 4.0.0, current version 4.2.1)
/opt/local/lib/libamd.2.4.0.dylib (compatibility version 2.0.0, current version 2.4.0)
/opt/local/lib/libcamd.2.4.0.dylib (compatibility version 2.0.0, current version 2.4.0)
/opt/local/lib/libcolamd.2.9.0.dylib (compatibility version 2.0.0, current version 2.9.0)
/opt/local/lib/libccolamd.2.9.0.dylib (compatibility version 2.0.0, current version 2.9.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
2) Yes, it's definitely enabled, and my manually compiled library refuses to
link if either -lcholmod or -lcxsparse is omitted.
3) I never specified any variants/options when installing Suitesparse:
$ port provides /opt/local/lib/libcholmod.3.0.0.dylib
/opt/local/lib/libcholmod.3.0.0.dylib is provided by: SuiteSparse
$ port echo installed | grep SuiteSparse
SuiteSparse @4.2.1_3
and as you see in (1), libcholmod.3.0.0.dylib was linked against Accelerate's
BLAS instead of ATLAS's (which is not installed--I have not intentionally
installed any BLAS implementation on this machine).
4) No change if I omit -g -O0 or compile with optimizations. It just segfaults
faster :)
Thanks for looking into this!
Original comment by julian.p...@gmail.com
on 13 Aug 2014 at 5:58
[deleted comment]
Another possibly relevant detail: I was able to compile valgrind from the SVN
repo even though most places online claim it doesn't support OS X 10.9 yet.
When I run it on the broken executable, it reports the following invalid
reads/writes in ceres code. These two errors are not present in the working
executable (the one linked against my manually compiled ceres):
==13683== Invalid read of size 8
==13683== at 0x1000F9F3A:
ceres::internal::FindPolynomialRoots(Eigen::Matrix<double, -1, 1, 0, -1, 1>
const&, Eigen::Matrix<double, -1, 1, 0, -1, 1>*, Eigen::Matrix<double, -1, 1,
0, -1, 1>*) (in ./broken)
==13683== by 0x1000E0AA6:
ceres::internal::LineSearch::InterpolatingPolynomialMinimizingStepSize(ceres::Li
neSearchInterpolationType const&, ceres::internal::FunctionSample const&,
ceres::internal::FunctionSample const&, ceres::internal::FunctionSample const&,
double, double) const (in ./broken)
==13683== by 0x1000E13F5: ceres::internal::ArmijoLineSearch::Search(double,
double, double, ceres::internal::LineSearch::Summary*) (in ./broken)
==13683== by 0x10003C9E7:
ceres::internal::TrustRegionMinimizer::Minimize(ceres::internal::Minimizer::Opti
ons const&, double*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x10002E670:
ceres::internal::SolverImpl::TrustRegionSolve(ceres::Solver::Options const&,
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x10002BC94:
ceres::internal::SolverImpl::Solve(ceres::Solver::Options const&,
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x10002B746: ceres::Solve(ceres::Solver::Options const&,
ceres::Problem*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x100001D80: run(unsigned long, std::__1::vector<Block,
std::__1::allocator<Block> >&) (test_ceres.cc:58)
==13683== by 0x100001EB1: main (test_ceres.cc:66)
==13683== Address 0x1001f0458 is 24 bytes after a block of size 128 in arena
"client"
==13683==
==13683== Invalid write of size 8
==13683== at 0x1000F9F44:
ceres::internal::FindPolynomialRoots(Eigen::Matrix<double, -1, 1, 0, -1, 1>
const&, Eigen::Matrix<double, -1, 1, 0, -1, 1>*, Eigen::Matrix<double, -1, 1,
0, -1, 1>*) (in ./broken)
==13683== by 0x1000E0AA6:
ceres::internal::LineSearch::InterpolatingPolynomialMinimizingStepSize(ceres::Li
neSearchInterpolationType const&, ceres::internal::FunctionSample const&,
ceres::internal::FunctionSample const&, ceres::internal::FunctionSample const&,
double, double) const (in ./broken)
==13683== by 0x1000E13F5: ceres::internal::ArmijoLineSearch::Search(double,
double, double, ceres::internal::LineSearch::Summary*) (in ./broken)
==13683== by 0x10003C9E7:
ceres::internal::TrustRegionMinimizer::Minimize(ceres::internal::Minimizer::Opti
ons const&, double*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x10002E670:
ceres::internal::SolverImpl::TrustRegionSolve(ceres::Solver::Options const&,
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x10002BC94:
ceres::internal::SolverImpl::Solve(ceres::Solver::Options const&,
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x10002B746: ceres::Solve(ceres::Solver::Options const&,
ceres::Problem*, ceres::Solver::Summary*) (in ./broken)
==13683== by 0x100001D80: run(unsigned long, std::__1::vector<Block,
std::__1::allocator<Block> >&) (test_ceres.cc:58)
==13683== by 0x100001EB1: main (test_ceres.cc:66)
==13683== Address 0x1001f0458 is 24 bytes after a block of size 128 in arena
"client"
Original comment by julian.p...@gmail.com
on 13 Aug 2014 at 7:11
Original issue reported on code.google.com by
julian.p...@gmail.com
on 13 Aug 2014 at 4:15