libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
631 stars 283 forks source link

Projection Matrix causing malloc on modified femsystem ex4 #1696

Open bboutkov opened 6 years ago

bboutkov commented 6 years ago

Hello, while doing some further testing for the upcoming GMG work in #1568 I came across a situation where I can reliably hit the PETSc error below while constructing some projection matrixes in a 3D scenario. I've managed to reduce it to an initial 2x2x2 mesh with 2 refinements and can trigger the error with -np 4.

Since it depends on still ongoing work I think the fastest place for me to demonstrate this in the modified fem_system/ex4 which I have been using which you can find at: proj_mat_3d_badmalloc

Please allow me to make the reminder that in order to trigger the section of the code we need to provide some command line options so the minimal ones should be something like SOLVER_OPTIONS="--use_petsc_dm -pc_mg_levels 3"

Any thoughts on the matter would be much appreciated. Thanks in advanced!

Error Below:


[3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[3]PETSC ERROR: Argument out of range
[3]PETSC ERROR: New nonzero at (8,26) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check
[3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[3]PETSC ERROR: Petsc Release Version 3.8.4, Mar, 24, 2018
[3]PETSC ERROR: ./example-opt on a gcc-7.2.0-mpich-3.2-openblas-0.2.20-opt named bender.eng.buffalo.edu by borisbou Fri May 11 14:30:15 2018
[3]PETSC ERROR: Configure options --with-make-np=24 --prefix=/bender1/data/shared/software/libs/petsc/3.8.4/gcc/7.2.0/mpich/3.2/openblas/0.2.20/opt --with-debugging=false --COPTFLAGS="-O3 -mavx" --CXXOPTFLAGS="-O3 -mavx" --FOPTFLAGS=-O3 -$
[3]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 610 in /bender1/data/shared/software/builddir/petsc-47E0sD/petsc-3.8.4/src/mat/impls/aij/mpi/mpiaij.c
[3]PETSC ERROR: #2 MatSetValues() line 1306 in /bender1/data/shared/software/builddir/petsc-47E0sD/petsc-3.8.4/src/mat/interface/matrix.c

#0  0x0000003788eac82e in waitpid () from /lib64/libc.so.6
#1  0x0000003788e3e479 in do_system () from /lib64/libc.so.6
#2  0x00007f2462066494 in libMesh::print_trace(std::ostream&) () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#3  0x00007f246206755f in libMesh::write_traceout() () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#4  0x00007f246204cc3a in libMesh::libmesh_terminate_handler() () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#5  0x00007f245a17e216 in __cxxabiv1::__terminate (handler=<optimized out>) at /bender1/data/shared/software/sourcesdir/gcc/gcc-7.2.0-src/libstdc++-v3/libsupc++/eh_terminate.cc:47
#6  0x00007f245a17e261 in std::terminate () at /bender1/data/shared/software/sourcesdir/gcc/gcc-7.2.0-src/libstdc++-v3/libsupc++/eh_terminate.cc:57
#7  0x00007f245a17e4a3 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x7f2462fbf840 <typeinfo for libMesh::PetscSolverException>, dest=0x7f24625c8720 <libMesh::PetscSolverException::~PetscSolverException()>) at /bender1/data/sh\
ared/software/sourcesdir/gcc/gcc-7.2.0-src/libstdc++-v3/libsupc++/eh_throw.cc:93
#8  0x00007f24625ce9c5 in libMesh::PetscMatrix<double>::set(unsigned int, unsigned int, double) () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#9  0x00007f246280bd94 in libMesh::GenericProjector<libMesh::OldSolutionCoefs<double, &(void libMesh::FEMContext::point_value<double>(unsigned int, libMesh::Point const&, double&, double) const)>, libMesh::OldSolutionCoefs<libMesh::Vect\
orValue<double>, &(void libMesh::FEMContext::point_gradient<libMesh::VectorValue<double> >(unsigned int, libMesh::Point const&, double&, double) const)>, MetaPhysicL::DynamicSparseNumberArray<double, unsigned int>, libMesh::MatrixFillAc\
tion<double, double> >::operator()(libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*> const&) const () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#10 0x00007f24628181ad in void libMesh::Threads::parallel_for<libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*>, libMesh::GenericProjector<libMesh::OldSolutionCoefs<double, &(void libMesh::FEMContext:\
:point_value<double>(unsigned int, libMesh::Point const&, double&, double) const)>, libMesh::OldSolutionCoefs<libMesh::VectorValue<double>, &(void libMesh::FEMContext::point_gradient<libMesh::VectorValue<double> >(unsigned int, libMesh:\
:Point const&, double&, double) const)>, MetaPhysicL::DynamicSparseNumberArray<double, unsigned int>, libMesh::MatrixFillAction<double, double> > >(libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*> co\
nst&, libMesh::GenericProjector<libMesh::OldSolutionCoefs<double, &(void libMesh::FEMContext::point_value<double>(unsigned int, libMesh::Point const&, double&, double) const)>, libMesh::OldSolutionCoefs<libMesh::VectorValue<double>, &(v\
oid libMesh::FEMContext::point_gradient<libMesh::VectorValue<double> >(unsigned int, libMesh::Point const&, double&, double) const)>, MetaPhysicL::DynamicSparseNumberArray<double, unsigned int>, libMesh::MatrixFillAction<double, double>\
 > const&) () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#11 0x00007f24627f1b34 in libMesh::System::projection_matrix(libMesh::SparseMatrix<double>&) const () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#12 0x00007f2462735559 in libMesh::PetscDMWrapper::init_and_attach_petscdm(libMesh::System&, _p_SNES*&) () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#13 0x00007f246272b2d3 in libMesh::PetscDiffSolver::setup_petsc_data() () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#14 0x00007f246272bbd2 in libMesh::PetscDiffSolver::init() () from /bender1/data/users/borisbou/software/mg_pet384/libmesh/install/lib/libmesh_opt.so.0
#15 0x000000000042785e in main ()
roystgnr commented 6 years ago

Can you try a git revert a555464 and see if the same problem occurs? It looks like we do indeed probably have a (hard-to-replicate) bug there.

roystgnr commented 6 years ago

If that's the case though then you might be able to get the same error on introduction_ex4, as in #1697. If so then please chime in there; at the moment I have no idea why this bug is affecting some users but not others!

bboutkov commented 6 years ago

The problem in this issue is independent of a555464 and it still occurs if I rebase the branch mentioned in the original post on master.

With regards to the problems in #1697, I have not managed to reproduce the issues brought up there with intro/ex4 when using as was mentioned -np 2 ./example-opt -n 10 -d 3.