Open dschwen opened 5 years ago
Some more debug stuff while I have it still open
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1558004b1)
* frame #0: 0x000000010e372ae9 libmoose-dbg.0.dylib`libMesh::Elem::level(this=0x000060f00026b690) const at elem.h:2664
frame #1: 0x00000001127d1988 libmesh_dbg.0.dylib`libMesh::CompareElemIdsByLevel::operator(this=0x00007ffee4ae3200, a=0x000060f00026b690, b=0x000060f00026c4a0)(libMesh::Elem const*, libMesh::Elem const*) const at compare_elems_by_level.h:50
frame #2: 0x00000001127d1b30 libmesh_dbg.0.dylib`std::__1::__tree_node_base<void*>*& std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__find_equal<libMesh::Elem const*>(this=0x00007ffee4ae31f0, __parent=0x00007ffee4ae2400, __v=0x00006020128598d0) at __tree:2007
frame #3: 0x00000001129d1858 libmesh_dbg.0.dylib`std::__1::pair<std::__1::__tree_iterator<libMesh::Elem const*, std::__1::__tree_node<libMesh::Elem const*, void*>*, long>, bool> std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__emplace_unique_key_args<libMesh::Elem const*, libMesh::Elem const* const&>(this=0x00007ffee4ae31f0, __k=0x00006020128598d0, __args=0x00006020128598d0) at __tree:2131
frame #4: 0x00000001129cd865 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(libMesh::DistributedMesh&, std::__1::set<libMesh::Elem*, std::__1::less<libMesh::Elem*>, std::__1::allocator<libMesh::Elem*> > const&) const [inlined] std::__1::__tree<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::__insert_unique(this=0x00007ffee4ae31f0, __v=0x00006020128598d0) at __tree:1273
frame #5: 0x00000001129cd844 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(libMesh::DistributedMesh&, std::__1::set<libMesh::Elem*, std::__1::less<libMesh::Elem*>, std::__1::allocator<libMesh::Elem*> > const&) const [inlined] std::__1::set<libMesh::Elem const*, libMesh::CompareElemIdsByLevel, std::__1::allocator<libMesh::Elem const*> >::insert(this=0x00007ffee4ae31f0 size=1618, __v=0x00006020128598d0) at set:599
frame #6: 0x00000001129cd828 libmesh_dbg.0.dylib`libMesh::MeshCommunication::delete_remote_elements(this=0x00007ffee4ae4880, mesh=0x0000612000005bc0, extra_ghost_elem_ids=size=3696) const at mesh_communication.C:1875
frame #7: 0x00000001128289d4 libmesh_dbg.0.dylib`libMesh::DistributedMesh::delete_remote_elements(this=0x0000612000005bc0) at distributed_mesh.C:1362
frame #8: 0x000000011297186a libmesh_dbg.0.dylib`libMesh::MeshBase::prepare_for_use(this=0x0000612000005bc0, skip_renumber_nodes_and_elements=false, skip_find_neighbors=false) at mesh_base.C:244
frame #9: 0x0000000112c555ee libmesh_dbg.0.dylib`libMesh::MeshRefinement::refine_and_coarsen_elements(this=0x000060d002b70730) at mesh_refinement.C:573
frame #10: 0x000000010fcd259a libmoose-dbg.0.dylib`Adaptivity::adaptMesh(this=0x0000625000013548, marker_name="range") at Adaptivity.C:177
frame #11: 0x000000010eb8f530 libmoose-dbg.0.dylib`FEProblemBase::adaptMesh(this=0x0000625000011918) at FEProblemBase.C:5113
frame #12: 0x000000010e580c4d libmoose-dbg.0.dylib`Transient::incrementStepOrReject(this=0x0000619000bb3598) at Transient.C:359
frame #13: 0x000000010e5809d3 libmoose-dbg.0.dylib`Transient::execute(this=0x0000619000bb3598) at Transient.C:322
frame #14: 0x000000010fc3fe3b libmoose-dbg.0.dylib`MooseApp::executeExecutioner(this=0x0000620000001098) at MooseApp.C:846
frame #15: 0x000000010fc413bf libmoose-dbg.0.dylib`MooseApp::run(this=0x0000620000001098) at MooseApp.C:956
frame #16: 0x000000010b1179d6 marmot-dbg`main(argc=10, argv=0x00007ffee4ae95c0) at main.C:39
frame #17: 0x00007fff58785ed9 libdyld.dylib`start + 1
(lldb) up
frame #1: 0x00000001127d1988 libmesh_dbg.0.dylib`libMesh::CompareElemIdsByLevel::operator(this=0x00007ffee4ae3200, a=0x000060f00026b690, b=0x000060f00026c4a0)(libMesh::Elem const*, libMesh::Elem const*) const at compare_elems_by_level.h:50
47 libmesh_assert (a);
48 libmesh_assert (b);
49 const unsigned int
-> 50 al = a->level(), bl = b->level();
51 const dof_id_type
52 aid = a->id(), bid = b->id();
53
(lldb) print a
(const libMesh::Elem *) $0 = 0x000060f00026b690
(lldb) print *a
(const libMesh::Elem) $1 = {
libMesh::DofObject = {
old_dof_object = 0x0000000000000000
_unique_id = 65199
_id = 18260
_processor_id = 3
_idx_buf = size=0 {}
}
_nodes = 0x000060f00026b718
_elemlinks = 0x000060f00026b6e8
_children = 0x0000000000000000
_sbd_id = 0
_rflag = '\x05'
_pflag = '\x05'
_p_level = '\0'
}
(lldb) print a->dim()
error: Execution was interrupted, reason: Attempted to dereference an invalid pointer..
The process has been returned to the state before expression evaluation.
(lldb) print *b
(const libMesh::Elem) $2 = {
libMesh::DofObject = {
old_dof_object = 0x000060400114fb90
_unique_id = 66459
_id = 18587
_processor_id = 3
_idx_buf = size=6 {
[0] = 2
[1] = 4
[2] = 1024
[3] = 4294967294
[4] = 257
[5] = 13575
}
}
_nodes = 0x000060f00026c528
_elemlinks = 0x000060f00026c4f8
_children = 0x0000000000000000
_sbd_id = 0
_rflag = '\x01'
_pflag = '\x01'
_p_level = '\0'
}
Looks like you have ~4k extra_ghost_elem_ids (there's an atavistically named variable...) and one of them is now a dangling pointer? That should never happen, but if I'm right then we can at least start trying to figure out how it did. A loop over DistributedMesh::_extra_ghost_elems that tries to call level() on each might trigger the same bug, and if it does then you can paste that loop around as a pre and post condition to figure out where the corruption is occuring.
I'm AFK with family issues right now but if you can set up a small test case I'll find time to try it sooner or later.
I'm running a MOOSE phase field simulation with a pre-split mesh that is made from a uniformly refined generated mesh (2D). After a few hundred steps the simulation segfaults in
libMesh::Elem::level()
(see below).@roystgnr, I'm trying to find a smaller test case for this, but in case I fail I have a simulation with a restart file that gets you to the error in about 5min.
backtrace: