idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.77k stars 1.05k forks source link

multiApp max_procs_per_app < # of mpi proc seg faults #11045

Open veeshy opened 6 years ago

veeshy commented 6 years ago

Rationale

Running a sim with multi apps, I want to limit my sub app to 1 processor but when the transfer happens after the master app solve, I get a segfault. Sim works when I don't limit the # proc for sub app. I am wanting the behavior of master app gets N procs, sub app forced to 1.

Description

Running moose test

mpiexec -n 2 ./moose_test-dbg -i tests/transfers/multiapp_copy_transfer/linear_lagrange_to_sub/master.i MultiApps/sub/max_procs_per_app=1

segfaults with:

Starting Transfers on TIMESTEP_END To MultiApps Beginning MultiAppCopyTransfer to_sub No index 67 in ghosted vector. Vector contains [0,67) And empty ghost array.

[0] /Users/veeshy/projects/moose/scripts/../libmesh/installed/include/libmesh/petsc_vector.h, line 1053, compiled Mar 16 2018 at 12:38:33 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

The subApp does use only 1 proc:

sub0: Running App: sub0 sub0: Parallelism: sub0: Num Processors: 1 sub0: Num Threads: 1

permcody commented 6 years ago

The MultiappCopyTransfer is designed for use when the master and sub meshes are identical. It's for the common case where you actually use the same mesh for both apps meaning that we can just copy information from DOF to DOF without having to do interpolation, or nearest node searches, etc.

We DO need to add better documentation which explains the purpose of this transfer though so I'll leave this ticket open. Also, we should probably just error when you try to use the proc limiting parameter too. However, we aren't going to attempt to fix this because we'll just end up with all the extra complication of figuring out which nodes are owned by which processors and end up re-implementing existing transfers.

veeshy commented 6 years ago

Hmm, I guess this test case doesn’t use the same mesh, but I was just looking for a simple example to show my problem. I’ve got a case that uses the same exact mesh that breaks when trying to limit processors, though it does sound like it might be the same issue with mesh communications.

If I use interpolations can I get around this?

permcody commented 6 years ago

MultiappMeshCopyTransfer is for the very special case when you have identical meshes and identical parallel discretization. In that case, we can simply copy solution values directly.

Anything that deviates from those conditions will require a different transfer, so yes, you will need a different transfer for your case.

I'm leaving this issue open until we can provide better error messages to the user when those conditions are violated.