libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
649 stars 284 forks source link

find_neighbors finds a different neighbor? #2026

Open lindsayad opened 5 years ago

lindsayad commented 5 years ago

This may have something to do with #1678. We had a user come to us with a parallel DisplacedProblem simulation running with --distributed-mesh that fails with a PETSc error like:

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Incompatible vector local lengths parameter # 1 local size 4644 != parameter # 2 local size 4632

The problem occurs when we try to sync the displaced_system.current_local_solution with the undisplaced_system.current_local_solution. The global and local sizes of these vectors are the same; however, the _send_list is different. The _send_list is larger for the undisplaced_system because the number of elements to ghost is larger by two elements (for both processes: 67 vs 65 for processor 0; 65 vs 63 for processor 1). I latched onto one of the elements (element 1448) that is ghosted for the undisplaced_system on processor 0 but not for the displaced_system. 1448 is a neighbor of element 664 in the undisplaced system, but not in the displaced system!

An lldb summary:

(lldb) b EquationSystems::init
Breakpoint 1: where = libmesh_dbg.so.0`libMesh::EquationSystems::init() + 19 at equation_systems.C:98, address = 0x00007f55759b9843
(lldb) c
Process 30667 resuming
Process 30667 stopped
* thread #1, name = 'golem-dbg', stop reason = breakpoint 1.1
    frame #0: 0x00007f55759b9843 libmesh_dbg.so.0`libMesh::EquationSystems::init(this=0x0000000001f5a3d8) at equation_systems.C:98
   95  
   96   void EquationSystems::init ()
   97   {
-> 98     const unsigned int n_sys = this->n_systems();
   99  
   100    libmesh_assert_not_equal_to (n_sys, 0);
   101 

This corresponds to the the undisplaced equation systems:

(lldb) p _mesh.elem_ptr(664)
(libMesh::Tri3 *) $0 = 0x0000000001dbb330
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)
(libMesh::Tri3 *) $1 = 0x0000000001e2aa60
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)
(libMesh::Tri3 *) $2 = 0x0000000001e29800
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)
(libMesh::Tri3 *) $3 = 0x0000000001e2a520
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->id()
(libMesh::dof_id_type) $5 = 1448
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->id()
(libMesh::dof_id_type) $6 = 678
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->id()
(libMesh::dof_id_type) $7 = 683
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->centroid()
(libMesh::Point) $8 = {
  libMesh::TypeVector = {
    _coords = ([0] = -7.607472007162869, [1] = -2.5389266380419335, [2] = -23.063092092672985)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->centroid()
(libMesh::Point) $9 = {
  libMesh::TypeVector = {
    _coords = ([0] = -14.014609877951443, [1] = -13.641004916901389, [2] = -5.1979944109916687)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->centroid()
(libMesh::Point) $10 = {
  libMesh::TypeVector = {
    _coords = ([0] = -24.067560513814289, [1] = -17.952677090962727, [2] = -31.695480346679688)
  }
}
(lldb) c
Process 30667 resuming
Process 30667 stopped
* thread #1, name = 'golem-dbg', stop reason = breakpoint 1.1
    frame #0: 0x00007f55759b9843 libmesh_dbg.so.0`libMesh::EquationSystems::init(this=0x0000000001f451a8) at equation_systems.C:98
   95  
   96   void EquationSystems::init ()
   97   {
-> 98     const unsigned int n_sys = this->n_systems();
   99  
   100    libmesh_assert_not_equal_to (n_sys, 0);
   101 

This corresponds to the displaced systems:

(lldb) p _mesh.elem_ptr(664)
(libMesh::Tri3 *) $11 = 0x0000000001e2b960
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)
(libMesh::Tri3 *) $12 = 0x0000000001edbbf0
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)
(libMesh::Tri3 *) $13 = 0x0000000001eda530
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)
(libMesh::Tri3 *) $14 = 0x0000000001eda990
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->id()
(libMesh::dof_id_type) $15 = 704
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->id()
(libMesh::dof_id_type) $16 = 678
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->id()
(libMesh::dof_id_type) $17 = 683
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->centroid()
(libMesh::Point) $18 = {
  libMesh::TypeVector = {
    _coords = ([0] = -14.503511334148547, [1] = -0.97677584178745746, [2] = -20.124373932679493)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->centroid()
(libMesh::Point) $19 = {
  libMesh::TypeVector = {
    _coords = ([0] = -14.014609877951443, [1] = -13.641004916901389, [2] = -5.1979944109916687)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->centroid()
(libMesh::Point) $20 = {
  libMesh::TypeVector = {
    _coords = ([0] = -24.067560513814289, [1] = -17.952677090962727, [2] = -31.695480346679688)
  }
}
(lldb) 

Summary

The zeroth neighbor of element 664 for undisplaced has a totally different id and centroid than the zeroth neighbor for the displaced systems.

Important notes

  1. The displaced mesh is copy constructed from the undisplaced mesh after the latter is partitioned; if I pass skip_find_neighbors = true to UnstructuredMesh::copy_nodes_and_elements, then there is no issue whatsoever, so for me this points to something happening in UnstructuredMesh::find_neighbors.
  2. This is a mesh whose highest dimensional elements are TET4 but that (as shown by the lldb session) contains lower dimensional TRI3 elements. The issue here is happening for a TRI3 element and neighbors, but perhaps the presence of the higher dimensionality elements is playing a role?
roystgnr commented 5 years ago

How big is the mesh here? Can you set up a failure case I can use to reproduce the problem?

lindsayad commented 5 years ago

This is 951 elements. This is from a pretty involved example with a moose application that is relatively unknown and one that we don't even test. I'll see if I can produce a more minimal example using just MOOSE object, but it might take be a while to get around to it.

On Fri, Feb 8, 2019 at 9:42 AM roystgnr notifications@github.com wrote:

How big is the mesh here? Can you set up a failure case I can use to reproduce the problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libMesh/libmesh/issues/2026#issuecomment-461865767, or mute the thread https://github.com/notifications/unsubscribe-auth/AJxgcLaMEiQ48in_h_SXZiFRQSKcmFzgks5vLakKgaJpZM4aqnal .