libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
654 stars 286 forks source link

Segfault in libMesh::Elem::level #2044

Open dschwen opened 5 years ago

dschwen commented 5 years ago

I'm running a MOOSE phase field simulation with a pre-split mesh that is made from a uniformly refined generated mesh (2D). After a few hundred steps the simulation segfaults in libMesh::Elem::level() (see below).

@roystgnr, I'm trying to find a smaller test case for this, but in case I fail I have a simulation with a restart file that gets you to the error in about 5min.

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1748004d8)
    frame #0: 0x000000010e3e1ae9 libmoose-dbg.0.dylib`libMesh::Elem::level(this=0x000060f000222df0) const at elem.h:2664
   2661   // dimensionality we are at the same level as
   2662   // the parent (e.g. we are the 2D side of a
   2663   // 3D element)
-> 2664   if (this->dim() != this->parent()->dim())
   2665     return this->parent()->level();
   2666 
   2667   // otherwise we are at a level one

(lldb) print this->dim()
error: Execution was interrupted, reason: Attempted to dereference an invalid pointer..
The process has been returned to the state before expression evaluation.
(lldb) print this
(libMesh::Elem *) $0 = 0x000060f000222df0
(lldb) 

backtrace:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1558004b1)
  * frame #0: 0x000000010e372ae9 libmoose-dbg.0.dylib`libMesh::Elem::level(this=0x000060f00026b690) const at elem.h:2664
    frame #1: 0x00000001127d1988 libmesh_dbg.0.dylib`libMesh::CompareElemIdsByLevel::operator(this=0x00007ffee4ae3200, a=0x000060f00026b690, b=0x000060f00026c4a0)(libMesh::Elem const*, libMesh::Elem const*) const at compare_elems_by_level.h:50
    frame #2: 0x00000001127d1b30 libmesh_dbg.0.dylib`std::__1::__tree_node_base<void*>*& std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__find_equal<libMesh::Elem const*>(this=0x00007ffee4ae31f0, __parent=0x00007ffee4ae2400, __v=0x00006020128598d0) at __tree:2007
    frame #3: 0x00000001129d1858 libmesh_dbg.0.dylib`std::__1::pair<std::__1::__tree_iterator<libMesh::Elem const*, std::__1::__tree_node<libMesh::Elem const*, void*>*, long>, bool> std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__emplace_unique_key_args<libMesh::Elem const*, libMesh::Elem const* const&>(this=0x00007ffee4ae31f0, __k=0x00006020128598d0, __args=0x00006020128598d0) at __tree:2131
    frame #4: 0x00000001129cd865 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(libMesh::DistributedMesh&, std::__1::set<libMesh::Elem*, std::__1::less<libMesh::Elem*>, std::__1::allocator<libMesh::Elem*> > const&) const [inlined] std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__insert_unique(this=0x00007ffee4ae31f0, __v=0x00006020128598d0) at __tree:1273
    frame #5: 0x00000001129cd844 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(libMesh::DistributedMesh&, std::__1::set<libMesh::Elem*, std::__1::less<libMesh::Elem*>, std::__1::allocator<libMesh::Elem*> > const&) const [inlined] std::__1::set<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::insert(this=0x00007ffee4ae31f0 size=1618, __v=0x00006020128598d0) at set:599
    frame #6: 0x00000001129cd828 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(this=0x00007ffee4ae4880, mesh=0x0000612000005bc0, extra_ghost_elem_ids=size=3696) const at mesh_communication.C:1875
    frame #7: 0x00000001128289d4 libmesh_dbg.0.dylib`libMesh::DistributedMesh::delete_remote_elements(this=0x0000612000005bc0) at distributed_mesh.C:1362
    frame #8: 0x000000011297186a libmesh_dbg.0.dylib`libMesh::MeshBase::prepare_for_use(this=0x0000612000005bc0, skip_renumber_nodes_and_elements=false, skip_find_neighbors=false) at mesh_base.C:244
    frame #9: 0x0000000112c555ee libmesh_dbg.0.dylib`libMesh::MeshRefinement::refine_and_coarsen_elements(this=0x000060d002b70730) at mesh_refinement.C:573
    frame #10: 0x000000010fcd259a libmoose-dbg.0.dylib`Adaptivity::adaptMesh(this=0x0000625000013548, marker_name="range") at Adaptivity.C:177
    frame #11: 0x000000010eb8f530 libmoose-dbg.0.dylib`FEProblemBase::adaptMesh(this=0x0000625000011918) at FEProblemBase.C:5113
    frame #12: 0x000000010e580c4d libmoose-dbg.0.dylib`Transient::incrementStepOrReject(this=0x0000619000bb3598) at Transient.C:359
    frame #13: 0x000000010e5809d3 libmoose-dbg.0.dylib`Transient::execute(this=0x0000619000bb3598) at Transient.C:322
    frame #14: 0x000000010fc3fe3b libmoose-dbg.0.dylib`MooseApp::executeExecutioner(this=0x0000620000001098) at MooseApp.C:846
    frame #15: 0x000000010fc413bf libmoose-dbg.0.dylib`MooseApp::run(this=0x0000620000001098) at MooseApp.C:956
    frame #16: 0x000000010b1179d6 marmot-dbg`main(argc=10, argv=0x00007ffee4ae95c0) at main.C:39
    frame #17: 0x00007fff58785ed9 libdyld.dylib`start + 1
dschwen commented 5 years ago

Some more debug stuff while I have it still open

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1558004b1)
  * frame #0: 0x000000010e372ae9 libmoose-dbg.0.dylib`libMesh::Elem::level(this=0x000060f00026b690) const at elem.h:2664
    frame #1: 0x00000001127d1988 libmesh_dbg.0.dylib`libMesh::CompareElemIdsByLevel::operator(this=0x00007ffee4ae3200, a=0x000060f00026b690, b=0x000060f00026c4a0)(libMesh::Elem const*, libMesh::Elem const*) const at compare_elems_by_level.h:50
    frame #2: 0x00000001127d1b30 libmesh_dbg.0.dylib`std::__1::__tree_node_base<void*>*& std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__find_equal<libMesh::Elem const*>(this=0x00007ffee4ae31f0, __parent=0x00007ffee4ae2400, __v=0x00006020128598d0) at __tree:2007
    frame #3: 0x00000001129d1858 libmesh_dbg.0.dylib`std::__1::pair<std::__1::__tree_iterator<libMesh::Elem const*, std::__1::__tree_node<libMesh::Elem const*, void*>*, long>, bool> std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__emplace_unique_key_args<libMesh::Elem const*, libMesh::Elem const* const&>(this=0x00007ffee4ae31f0, __k=0x00006020128598d0, __args=0x00006020128598d0) at __tree:2131
    frame #4: 0x00000001129cd865 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(libMesh::DistributedMesh&, std::__1::set<libMesh::Elem*, std::__1::less<libMesh::Elem*>, std::__1::allocator<libMesh::Elem*> > const&) const [inlined] std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__insert_unique(this=0x00007ffee4ae31f0, __v=0x00006020128598d0) at __tree:1273
    frame #5: 0x00000001129cd844 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(libMesh::DistributedMesh&, std::__1::set<libMesh::Elem*, std::__1::less<libMesh::Elem*>, std::__1::allocator<libMesh::Elem*> > const&) const [inlined] std::__1::set<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::insert(this=0x00007ffee4ae31f0 size=1618, __v=0x00006020128598d0) at set:599
    frame #6: 0x00000001129cd828 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(this=0x00007ffee4ae4880, mesh=0x0000612000005bc0, extra_ghost_elem_ids=size=3696) const at mesh_communication.C:1875
    frame #7: 0x00000001128289d4 libmesh_dbg.0.dylib`libMesh::DistributedMesh::delete_remote_elements(this=0x0000612000005bc0) at distributed_mesh.C:1362
    frame #8: 0x000000011297186a libmesh_dbg.0.dylib`libMesh::MeshBase::prepare_for_use(this=0x0000612000005bc0, skip_renumber_nodes_and_elements=false, skip_find_neighbors=false) at mesh_base.C:244
    frame #9: 0x0000000112c555ee libmesh_dbg.0.dylib`libMesh::MeshRefinement::refine_and_coarsen_elements(this=0x000060d002b70730) at mesh_refinement.C:573
    frame #10: 0x000000010fcd259a libmoose-dbg.0.dylib`Adaptivity::adaptMesh(this=0x0000625000013548, marker_name="range") at Adaptivity.C:177
    frame #11: 0x000000010eb8f530 libmoose-dbg.0.dylib`FEProblemBase::adaptMesh(this=0x0000625000011918) at FEProblemBase.C:5113
    frame #12: 0x000000010e580c4d libmoose-dbg.0.dylib`Transient::incrementStepOrReject(this=0x0000619000bb3598) at Transient.C:359
    frame #13: 0x000000010e5809d3 libmoose-dbg.0.dylib`Transient::execute(this=0x0000619000bb3598) at Transient.C:322
    frame #14: 0x000000010fc3fe3b libmoose-dbg.0.dylib`MooseApp::executeExecutioner(this=0x0000620000001098) at MooseApp.C:846
    frame #15: 0x000000010fc413bf libmoose-dbg.0.dylib`MooseApp::run(this=0x0000620000001098) at MooseApp.C:956
    frame #16: 0x000000010b1179d6 marmot-dbg`main(argc=10, argv=0x00007ffee4ae95c0) at main.C:39
    frame #17: 0x00007fff58785ed9 libdyld.dylib`start + 1
(lldb) up
frame #1: 0x00000001127d1988 libmesh_dbg.0.dylib`libMesh::CompareElemIdsByLevel::operator(this=0x00007ffee4ae3200, a=0x000060f00026b690, b=0x000060f00026c4a0)(libMesh::Elem const*, libMesh::Elem const*) const at compare_elems_by_level.h:50
   47       libmesh_assert (a);
   48       libmesh_assert (b);
   49       const unsigned int
-> 50         al = a->level(), bl = b->level();
   51       const dof_id_type
   52         aid = a->id(),   bid = b->id();
   53   
(lldb) print a
(const libMesh::Elem *) $0 = 0x000060f00026b690
(lldb) print *a
(const libMesh::Elem) $1 = {
  libMesh::DofObject = {
    old_dof_object = 0x0000000000000000
    _unique_id = 65199
    _id = 18260
    _processor_id = 3
    _idx_buf = size=0 {}
  }
  _nodes = 0x000060f00026b718
  _elemlinks = 0x000060f00026b6e8
  _children = 0x0000000000000000
  _sbd_id = 0
  _rflag = '\x05'
  _pflag = '\x05'
  _p_level = '\0'
}
(lldb) print a->dim()
error: Execution was interrupted, reason: Attempted to dereference an invalid pointer..
The process has been returned to the state before expression evaluation.
(lldb) print *b
(const libMesh::Elem) $2 = {
  libMesh::DofObject = {
    old_dof_object = 0x000060400114fb90
    _unique_id = 66459
    _id = 18587
    _processor_id = 3
    _idx_buf = size=6 {
      [0] = 2
      [1] = 4
      [2] = 1024
      [3] = 4294967294
      [4] = 257
      [5] = 13575
    }
  }
  _nodes = 0x000060f00026c528
  _elemlinks = 0x000060f00026c4f8
  _children = 0x0000000000000000
  _sbd_id = 0
  _rflag = '\x01'
  _pflag = '\x01'
  _p_level = '\0'
}
roystgnr commented 5 years ago

Looks like you have ~4k extra_ghost_elem_ids (there's an atavistically named variable...) and one of them is now a dangling pointer? That should never happen, but if I'm right then we can at least start trying to figure out how it did. A loop over DistributedMesh::_extra_ghost_elems that tries to call level() on each might trigger the same bug, and if it does then you can paste that loop around as a pre and post condition to figure out where the corruption is occuring.

I'm AFK with family issues right now but if you can set up a small test case I'll find time to try it sooner or later.