techoe / ceres-solver

Automatically exported from code.google.com/p/ceres-solver
Other
0 stars 0 forks source link

ceres-solver 1.9.0 from MacPorts segfaults on a particular problem #150

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Install ceres-solver with MacPorts (either from binary with "port install 
ceres-solver" or source with "port -s install ceres-solver").
2. Compile the program at http://julianpanetta.com/macports_bugs/test_ceres.cc 
with /usr/bin/clang++:
/usr/bin/clang++ -g -O0 -std=c++11 -I/opt/local/include
 -I/opt/local/include/eigen3 test_ceres.cc -L/opt/local/lib -lceres -lglog
 -lgflags -lcholmod -lcxsparse -framework accelerate -o broken
3) ./broken

What is the expected output? What do you see instead?

It should print 0..19, but it intermittently segfaults partway through or 
throws a std::length error because Ceres has scribbled over the arrays v1s and 
v2s.

What version of the product are you using? On what operating system?

I've tried with 1.9.0 installed from MacPorts on both OS 10.9.3 and 10.9.4.

Please provide any additional information below.

I've filed a ticket with MacPorts because this only seems to happen with a 
MacPorts-installed (statically linked) Ceres. The test program runs fine when 
statically linked against a Ceres version manually built with cmake (both the 
git head, and the stable 1.9.0 release). However, the MacPorts configuration is 
so minimal it's unclear where the problem could be. More details are at:

https://trac.macports.org/ticket/44627

Original issue reported on code.google.com by julian.p...@gmail.com on 13 Aug 2014 at 4:15

GoogleCodeExporter commented 9 years ago
Thanks Julian.

I am adding Keir and Julian to the thread. They may know about this.

I tried the code myself and it did not generate any problems. I think you are 
right in suspecting that there is something about how macports is building this 
library that is the problem.

Original comment by sameerag...@google.com on 13 Aug 2014 at 4:20

GoogleCodeExporter commented 9 years ago
I should have mentioned that the test program is quite specific; even changing 
the "magic data" loaded into v1s and v2s makes the segfault disappear. Also, 
removing the lower bound on variable 2 of each block--removing "Bounds(2, 
-0.75)"--suppresses the segfault, suggesting that it might occur when that 
constraint is active. (I didn't try removing the other bounds since they're 
needed to prevent a divide by zero).

The segfault is usually, but not always, in cholmod_pack_factor called 
indirectly from internal::SuiteSparse::Cholesky, though I suspect the bug 
doesn't originate there, but rather with some other part of Ceres scribbling 
over memory.

Original comment by julian.p...@gmail.com on 13 Aug 2014 at 4:26

GoogleCodeExporter commented 9 years ago
If you have MacPorts installed, you can see exactly how MacPorts builds 
ceres-solver:

sudo port -dvs destroot ceres-solver

I have attached the output from my machine.

Original comment by Mark.M...@gmail.com on 13 Aug 2014 at 5:02

GoogleCodeExporter commented 9 years ago

Original comment by Mark.M...@gmail.com on 13 Aug 2014 at 5:03

Attachments:

GoogleCodeExporter commented 9 years ago
1) The list of libraries you're linking when compiling test_ceres isn't 
complete, at least on my machine CMake lists the following as SuiteSparse 
dependencies that will be pulled in when Ceres is built statically:

libspqr
libcholmod
libccolamd
libcamd
libcolamd
libamd
libsuitesparseconfig
libmetis
libcxsparse

Is your cholmod dynamically linked to everything else?

2) When you're compiling Ceres manually via CMake and it's working, is 
SuiteSparse definitely enabled?

3) What options are you compiling SuiteSparse with on MacPorts?  The ports file 
is pretty lengthy and includes an option to use ATLAS rather than the 
Accelerate framework (which would not be being respected by Ceres unless you 
explicitly told it to use a different BLAS library).

4) Is there any change if you remove the arguments: -g -O0 when compiling 
test_ceres?

Original comment by alexs....@gmail.com on 13 Aug 2014 at 5:16

GoogleCodeExporter commented 9 years ago
1) Yes, my cholmod is linked against everything else:

 $ otool -L broken
broken:
        /opt/local/lib/libglog.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        /opt/local/lib/libgflags.2.dylib (compatibility version 4.0.0, current version 4.0.0)
        /opt/local/lib/libcholmod.3.0.0.dylib (compatibility version 3.0.0, current version 3.0.0)
        /opt/local/lib/libcxsparse.3.1.3.dylib (compatibility version 3.0.0, current version 3.1.3)
        /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
 $ otool -L /opt/local/lib/libcholmod.3.0.0.dylib
/opt/local/lib/libcholmod.3.0.0.dylib:
        /opt/local/lib/libcholmod.3.0.0.dylib (compatibility version 3.0.0, current version 3.0.0)
        /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
        /opt/local/lib/libsuitesparseconfig.4.2.1.dylib (compatibility version 4.0.0, current version 4.2.1)
        /opt/local/lib/libamd.2.4.0.dylib (compatibility version 2.0.0, current version 2.4.0)
        /opt/local/lib/libcamd.2.4.0.dylib (compatibility version 2.0.0, current version 2.4.0)
        /opt/local/lib/libcolamd.2.9.0.dylib (compatibility version 2.0.0, current version 2.9.0)
        /opt/local/lib/libccolamd.2.9.0.dylib (compatibility version 2.0.0, current version 2.9.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)

2) Yes, it's definitely enabled, and my manually compiled library refuses to 
link if either -lcholmod or -lcxsparse is omitted.

3) I never specified any variants/options when installing Suitesparse:

 $ port provides /opt/local/lib/libcholmod.3.0.0.dylib
/opt/local/lib/libcholmod.3.0.0.dylib is provided by: SuiteSparse
 $ port echo installed | grep SuiteSparse
SuiteSparse                    @4.2.1_3

and as you see in (1), libcholmod.3.0.0.dylib was linked against Accelerate's 
BLAS instead of ATLAS's (which is not installed--I have not intentionally 
installed any BLAS implementation on this machine).

4) No change if I omit -g -O0 or compile with optimizations. It just segfaults 
faster :)

Thanks for looking into this!

Original comment by julian.p...@gmail.com on 13 Aug 2014 at 5:58

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Another possibly relevant detail: I was able to compile valgrind from the SVN 
repo even though most places online claim it doesn't support OS X 10.9 yet. 
When I run it on the broken executable, it reports the following invalid 
reads/writes in ceres code. These two errors are not present in the working 
executable (the one linked against my manually compiled ceres):

==13683== Invalid read of size 8
==13683==    at 0x1000F9F3A: 
ceres::internal::FindPolynomialRoots(Eigen::Matrix<double, -1, 1, 0, -1, 1> 
const&, Eigen::Matrix<double, -1, 1, 0, -1, 1>*, Eigen::Matrix<double, -1, 1, 
0, -1, 1>*) (in ./broken)
==13683==    by 0x1000E0AA6: 
ceres::internal::LineSearch::InterpolatingPolynomialMinimizingStepSize(ceres::Li
neSearchInterpolationType const&, ceres::internal::FunctionSample const&, 
ceres::internal::FunctionSample const&, ceres::internal::FunctionSample const&, 
double, double) const (in ./broken)
==13683==    by 0x1000E13F5: ceres::internal::ArmijoLineSearch::Search(double, 
double, double, ceres::internal::LineSearch::Summary*) (in ./broken)
==13683==    by 0x10003C9E7: 
ceres::internal::TrustRegionMinimizer::Minimize(ceres::internal::Minimizer::Opti
ons const&, double*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x10002E670: 
ceres::internal::SolverImpl::TrustRegionSolve(ceres::Solver::Options const&, 
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x10002BC94: 
ceres::internal::SolverImpl::Solve(ceres::Solver::Options const&, 
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x10002B746: ceres::Solve(ceres::Solver::Options const&, 
ceres::Problem*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x100001D80: run(unsigned long, std::__1::vector<Block, 
std::__1::allocator<Block> >&) (test_ceres.cc:58)
==13683==    by 0x100001EB1: main (test_ceres.cc:66)
==13683==  Address 0x1001f0458 is 24 bytes after a block of size 128 in arena 
"client"
==13683==
==13683== Invalid write of size 8
==13683==    at 0x1000F9F44: 
ceres::internal::FindPolynomialRoots(Eigen::Matrix<double, -1, 1, 0, -1, 1> 
const&, Eigen::Matrix<double, -1, 1, 0, -1, 1>*, Eigen::Matrix<double, -1, 1, 
0, -1, 1>*) (in ./broken)
==13683==    by 0x1000E0AA6: 
ceres::internal::LineSearch::InterpolatingPolynomialMinimizingStepSize(ceres::Li
neSearchInterpolationType const&, ceres::internal::FunctionSample const&, 
ceres::internal::FunctionSample const&, ceres::internal::FunctionSample const&, 
double, double) const (in ./broken)
==13683==    by 0x1000E13F5: ceres::internal::ArmijoLineSearch::Search(double, 
double, double, ceres::internal::LineSearch::Summary*) (in ./broken)
==13683==    by 0x10003C9E7: 
ceres::internal::TrustRegionMinimizer::Minimize(ceres::internal::Minimizer::Opti
ons const&, double*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x10002E670: 
ceres::internal::SolverImpl::TrustRegionSolve(ceres::Solver::Options const&, 
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x10002BC94: 
ceres::internal::SolverImpl::Solve(ceres::Solver::Options const&, 
ceres::internal::ProblemImpl*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x10002B746: ceres::Solve(ceres::Solver::Options const&, 
ceres::Problem*, ceres::Solver::Summary*) (in ./broken)
==13683==    by 0x100001D80: run(unsigned long, std::__1::vector<Block, 
std::__1::allocator<Block> >&) (test_ceres.cc:58)
==13683==    by 0x100001EB1: main (test_ceres.cc:66)
==13683==  Address 0x1001f0458 is 24 bytes after a block of size 128 in arena 
"client"

Original comment by julian.p...@gmail.com on 13 Aug 2014 at 7:11