sofa-framework / sofa

Real-time multi-physics simulation with an emphasis on medical simulation.
https://www.sofa-framework.org
GNU Lesser General Public License v2.1
871 stars 297 forks source link

Segfault in SparseLDLSolver #3499

Closed ScheiklP closed 1 year ago

ScheiklP commented 1 year ago

Problem

Hi! There seems to be a check missing that prevents SparseLDLSolver trying to solve an empty system. When I create an object with n points, I can remove n-1 and get a segfault on the nth.

########## SIG 8 - SIGFPE: arithmetic exception, such as divide by zero ##########
  sofa::helper::BackTrace::sig(int)
  libmetis__FM_2WayCutRefine
  libmetis__RandomBisection
  libmetis__InitSeparator
  libmetis__MlevelNodeBisectionL1
  libmetis__MlevelNestedDissection
  METIS_NodeND
  void sofa::component::linearsolver::direct::SparseLDLSolverImpl<sofa::linearalgebra::CompressedRowSparseMatrix<double, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::factorize<sofa::type::vector<int, sofa::type::CPUMemoryManager<int> >, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> > >(int, int*, int*, double*, sofa::component::linearsolver::direct::SparseLDLImplInvertData<sofa::type::vector<int, sofa::type::CPUMemoryManager<int> >, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> > >*)
  sofa::component::linearsolver::direct::SparseLDLSolver<sofa::linearalgebra::CompressedRowSparseMatrix<double, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::factorize(sofa::linearalgebra::CompressedRowSparseMatrix<double, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >&, sofa::component::linearsolver::direct::SparseLDLImplInvertData<sofa::type::vector<int, sofa::type::CPUMemoryManager<int> >, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> > >*)
  sofa::component::linearsolver::MatrixLinearSolver<sofa::linearalgebra::CompressedRowSparseMatrix<double, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::solveSystem()
  sofa::component::odesolver::backward::EulerImplicitSolver::solve(sofa::core::ExecParams const*, double, sofa::core::TMultiVecId<(sofa::core::VecType)1, (sofa::core::VecAccess)1>, sofa::core::TMultiVecId<(sofa::core::VecType)2, (sofa::core::VecAccess)1>)
  sofa::simulation::SolveVisitor::processSolver(sofa::simulation::Node*, sofa::core::behavior::OdeSolver*)
  sofa::simulation::SolveVisitor::sequentialSolve(sofa::simulation::Node*)
  sofa::simulation::SolveVisitor::processNodeTopDown(sofa::simulation::Node*)

https://user-images.githubusercontent.com/29635054/203099833-f2da7dc9-a32c-45a3-8ef5-0e4497ac3034.mp4

That does not happen with SparseLUSolver.


Environment

Context

Cheers, Paul

alxbilger commented 1 year ago

It's not a very common situation, so thanks for the report

ScheiklP commented 1 year ago

It's not a very common situation, so thanks for the report

Many of the scenes I am working on right now kind of depend on that. :sweat_smile: Do you already have an idea where to look for the difference between the solvers that might cause the behavior? I would love to create a PR, but I have no idea how solvers work in SOFA. :D

Cheers, Paul

alxbilger commented 1 year ago

Hi Paul,

I made some tests on basic scenes and it seems empty systems are supported (in the sense it does not crash). See https://github.com/sofa-framework/sofa/pull/3500

The factorization is supposed to stop here: https://github.com/sofa-framework/sofa/blob/master/Sofa/Component/LinearSolver/Direct/src/sofa/component/linearsolver/direct/SparseLDLSolver.inl#L93

Can you debug your scene around this line to investigate why this condition is not reached?

Alex

epernod commented 1 year ago

I'm gonna set my debug env. Could you share a basic example or your scene so I can debug it.

alxbilger commented 1 year ago

I managed to reproduce the bug on a basic scene:

<Node name="root">
    <DefaultAnimationLoop/>
    <EulerImplicitSolver/>
    <SparseLDLSolver/>
    <PointSetTopologyContainer position="0 0 0"/>
    <PointSetTopologyModifier/>
    <MechanicalObject/>
    <UniformMass/>
    <TopologicalChangeProcessor useDataInputs="true" timeToRemove="0.1" pointsToRemove="0"/>
</Node>

It seems the topology change is essential to reproduce the crash.

I cannot go further this morning. I'll catch up this afternoon if you guys found something.

alxbilger commented 1 year ago

@ScheiklP you can try to add n == 0 to the condition:

    if(M_colptr==nullptr || M_rowind==nullptr || M_values==nullptr || Mfiltered.getRowBegin().size() < (size_t)n || n == 0)
    {
        msg_warning() << "Invalid Linear System to solve. Please insure that there is enough constraints (not rank deficient)." ;
        return true;
    }

This way, I don't have a crash with the basic example I shared in my previous comment.

Feel free to make a PR if the change works for you.

ScheiklP commented 1 year ago

Hi @alxbilger thanks for the tip! :) Adding this check does not fix the segfault in my scene. The call that segfaults is Mfiltered.getColsIndex() of the solver's sofa::linearalgebra::CompressedRowSparseMatrix.

I then tried moving the check up, just to make sure that's the issue

    int n = M.colSize();

    if(n == 0)
    {
        msg_warning() << "Invalid Linear System to solve. Please insure that there is enough constraints (not rank deficient)." ;
        return true;
    }

    int * M_colptr = (int *) &Mfiltered.getRowBegin()[0];
    int * M_rowind = (int *) &Mfiltered.getColsIndex()[0];
    Real * M_values = (Real *) &Mfiltered.getColsValue()[0];

    if(M_colptr==nullptr || M_rowind==nullptr || M_values==nullptr || Mfiltered.getRowBegin().size() < (size_t)n )
    {
        msg_warning() << "Invalid Linear System to solve. Please insure that there is enough constraints (not rank deficient)." ;
        return true;
    }

But ran into another error that follows

[WARNING] [SparseLDLSolver(SparseLDLSolver)] Invalid Linear System to solve. Please insure that there is enough constraints (not rank deficient).
[ERROR]   [FullVector] in vector<FullVector<double>> 0x5595582fbf00 size 0 : invalid index 0
  sofa::linearalgebra::FullVector<double>::checkIndex(int) const
  sofa::linearalgebra::FullVector<double>::operator[](int)
  sofa::component::linearsolver::direct::SparseLDLSolver<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::solve(sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >&, sofa::linearalgebra::FullVector<double>&, sofa::linearalgebra::FullVector<double>&)
  sofa::component::linearsolver::MatrixLinearSolver<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::solveSystem()
  sofa::simulation::common::MechanicalOperations::m_solveSystem()
  sofa::core::behavior::MultiMatrix<sofa::simulation::common::MechanicalOperations>::solve(sofa::core::TMultiVecId<(sofa::core::VecType)2, (sofa::core::VecAccess)1>, sofa::core::TMultiVecId<(sofa::core::VecType)2, (sofa::core::VecAccess)1>)
  sofa::component::odesolver::backward::EulerImplicitSolver::solve(sofa::core::ExecParams const*, double, sofa::core::TMultiVecId<(sofa::core::VecType)1, (sofa::core::VecAccess)1>, sofa::core::TMultiVecId<(sofa::core::VecType)2, (sofa::core::VecAccess)1>)
  sofa::simulation::SolveVisitor::processSolver(sofa::simulation::Node*, sofa::core::behavior::OdeSolver*)
  void sofa::simulation::Visitor::runVisitorTask<sofa::simulation::SolveVisitor, sofa::simulation::Node, sofa::core::behavior::OdeSolver>(sofa::simulation::SolveVisitor*, sofa::simulation::Node*, void (sofa::simulation::SolveVisitor::*)(sofa::simulation::Node*, sofa::core::behavior::OdeSolver*), sofa::core::behavior::OdeSolver*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  void sofa::simulation::Visitor::for_each<sofa::simulation::SolveVisitor, sofa::simulation::Node, sofa::simulation::NodeSequence<sofa::core::behavior::OdeSolver, false>, sofa::core::behavior::OdeSolver>(sofa::simulation::SolveVisitor*, sofa::simulation::Node*, sofa::simulation::NodeSequence<sofa::core::behavior::OdeSolver, false> const&, void (sofa::simulation::SolveVisitor::*)(sofa::simulation::Node*, sofa::core::behavior::OdeSolver*), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  sofa::simulation::SolveVisitor::sequentialSolve(sofa::simulation::Node*)

So the n == 0 check is correctly triggered, but solveSystem() fails. I cannot find the definition of solveSystem in SparseLDLSolver. Is that the same as in void AsyncSparseLDLSolver<TMatrix, TVector, TThreadManager>::solveSystem()?

Cheers, Paul

alxbilger commented 1 year ago

The new crash you have is in the solve function, trying to access an element of an empty vector (either the solution vector, or the right-hand side vector). The solve function is here, called from here.

The solveSystem function works in 2 steps: 1) factorize the matrix into LDL, 2) solve the system based on the factorization. As far as I can see, the solving process runs even if the factorization fails. It's probably why you have the crash. A modification to do could be to catch the error in the solveSystem function and not perform the solving.

In the meantime, as a quick fix, you can try to set invertData->n = n; before leaving the function:

int n = M.colSize();
invertData->n = n;

invertData->n would be =0, then the solve function may run safely.

ScheiklP commented 1 year ago

Hi @alxbilger , Ah, I see...

Setting invertData->n = n; sadly does not prevent the segfault.

[WARNING] [SparseLDLSolver(SparseLDLSolver)] Invalid Linear System to solve. Please insure that there is enough constraints (not rank deficient).
[ERROR]   [FullVector] in vector<FullVector<double>> 0x555d8bdcea90 size 0 : invalid index 0
  sofa::linearalgebra::FullVector<double>::checkIndex(int) const
  sofa::linearalgebra::FullVector<double>::operator[](int)
  sofa::component::linearsolver::direct::SparseLDLSolver<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::solve(sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >&, sofa::linearalgebra::FullVector<double>&, sofa::linearalgebra::FullVector<double>&)
  sofa::component::linearsolver::MatrixLinearSolver<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::solveSystem()

So the proper way would be changing invert from a void to bool return type and then checking that in solveSystem, right? Is there a way to do that, without having to change the return type for all the solvers in linearsolver.direct ? :D

alxbilger commented 1 year ago

Hi @ScheiklP

Can you check if this PR fixes your crash?

ScheiklP commented 1 year ago

Hi @alxbilger , no, it does not. :(

########## SIG 8 - SIGFPE: arithmetic exception, such as divide by zero ##########
  sofa::helper::BackTrace::sig(int)
  libmetis__FM_2WayCutRefine
  libmetis__RandomBisection
  libmetis__InitSeparator
  libmetis__MlevelNodeBisectionL1
  libmetis__MlevelNestedDissection
  METIS_NodeND
  void sofa::component::linearsolver::direct::SparseLDLSolverImpl<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::factorize<sofa::type::vector<int, sofa::type::CPUMemoryManager<int> >, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> > >(int, int*, int*, double*, sofa::component::linearsolver::direct::SparseLDLImplInvertData<sofa::type::vector<int, sofa::type::CPUMemoryManager<int> >, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> > >*)
  sofa::component::linearsolver::direct::SparseLDLSolver<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::factorize(sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >&, sofa::component::linearsolver::direct::SparseLDLImplInvertData<sofa::type::vector<int, sofa::type::CPUMemoryManager<int> >, sofa::type::vector<double, sofa::type::CPUMemoryManager<double> > >*)
  sofa::component::linearsolver::MatrixLinearSolver<sofa::linearalgebra::CompressedRowSparseMatrix<sofa::type::Mat<3u, 3u, double>, sofa::type::vector<sofa::type::Mat<3u, 3u, double>, sofa::type::CPUMemoryManager<sofa::type::Mat<3u, 3u, double> > >, sofa::type::vector<unsigned int, sofa::type::CPUMemoryManager<unsigned int> > >, sofa::linearalgebra::FullVector<double>, sofa::component::linearsolver::NoThreadManager>::solveSystem()
  sofa::component::odesolver::backward::EulerImplicitSolver::solve(sofa::core::ExecParams const*, double, sofa::core::TMultiVecId<(sofa::core::VecType)1, (sofa::core::VecAccess)1>, sofa::core::TMultiVecId<(sofa::core::VecType)2, (sofa::core::VecAccess)1>)
  sofa::simulation::SolveVisitor::processSolver(sofa::simulation::Node*, sofa::core::behavior::OdeSolver*)
  sofa::simulation::SolveVisitor::sequentialSolve(sofa::simulation::Node*)
  sofa::simulation::SolveVisitor::processNodeTopDown(sofa::simulation::Node*)
alxbilger commented 1 year ago

At least, it's not the same error 😆 Did you try to add the n==0 condition in addition to the code in the PR https://github.com/sofa-framework/sofa/pull/3512 ?

ScheiklP commented 1 year ago

Hi @alxbilger , oh, perfect, that works! :)

Should we add it to your PR, or to a separate one?

alxbilger commented 1 year ago

https://github.com/sofa-framework/sofa/pull/3512 is already merged. But you could try to add your changes on https://github.com/sofa-framework/sofa/pull/3501.

According to my unit test, a simple

    if (n == 0)
    {
        return true;
    }

makes the job

ScheiklP commented 1 year ago

As far as I can see, I cannot push commits to that PR or suggest changes in untouched files. Am I missing something, or could you add the change? :D

alxbilger commented 1 year ago

Indeed, you don't have the rights. I suggest you make your own PR then

alxbilger commented 1 year ago

@ScheiklP can you confirm we can close this issue?

ScheiklP commented 1 year ago

@ScheiklP can you confirm we can close this issue?

Ah, yes, sorry. Confirmed. :)