P4est MPI simulation gets stuck when not all ranks have BC faces

benegee commented 3 months ago

I first encountered this issue for the baroclinic instability test case. Then I found a reduced example based on elixir_advection_basic but with nonperiodic boundary conditions:

3 cells
3 MPI ranks
periodic boundary conditions in x-direction
Dirichlet boundary conditions in y-direction
important is that the rank in the middle does not have any boundaries with Dirichlet boundary condition

Screenshot_20240319_120453

Running this with system MPI and on 3 ranks makes the simulation hang. Running with tmpi and looking at the backtraces when aborting shows that two ranks have called init_boundaries and eventually p4est_iterate, while the third (the middle one) is already somewhere in rhs!.

It seems to be caused by the check https://github.com/trixi-framework/Trixi.jl/blob/2dfde7faf3cc74f066d86148ae6c99ed9e58fa79/src/solvers/dgsem_p4est/containers.jl#L266-L268 Consequently only two ranks eventually call p4est_iterate. On the p4est side p4est_iterate first calls p4est_is_valid, which calls MPI_Allreduce.

This would explain the blocking. What I do not understand is why it works when not using system MPI.

MWE

```julia using OrdinaryDiffEq using Trixi ############################################################################### # semidiscretization of the linear advection-diffusion equation advection_velocity = (1.0, 0.0) equations = LinearScalarAdvectionEquation2D(advection_velocity) initial_condition = initial_condition_gauss boundary_conditions = Dict(:y_neg => BoundaryConditionDirichlet(initial_condition), :y_pos => BoundaryConditionDirichlet(initial_condition)) # Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs) coordinates_min = (-5.0, -5.0) coordinates_max = (5.0, 5.0) trees_per_dimension = (1, 3) mesh = P4estMesh(trees_per_dimension, polydeg = 3, coordinates_min = coordinates_min, coordinates_max = coordinates_max, initial_refinement_level = 0, periodicity = (true, false)) # A semidiscretization collects data structures and functions for the spatial discretization semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver, boundary_conditions = boundary_conditions) ############################################################################### # ODE solvers, callbacks etc. # Create ODE problem with time span `tspan` tspan = (0.0, 1.0) ode = semidiscretize(semi, tspan); # At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup # and resets the timers summary_callback = SummaryCallback() callbacks = CallbackSet(summary_callback) ############################################################################### # run the simulation # OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false), dt = 0.01, save_everystep = false, callback = callbacks); # Print the timer summary summary_callback() ```

sloede commented 3 months ago

This would explain the blocking. What I do not understand is why it works when not using system MPI.

I share your conclusion. It seems to be a good explanation for the blocking, but does not explain why it works with the out-of-the-box MPI. Maybe you can verify (using a "wololo" statement or just plain output to stderr) that indeed the init_boundaries! function is not called on the middle rank for the Julia-provided MPI but gets called by the system MPI?

benegee commented 3 months ago

I double-checked this: init_boundaries! is not called on the middle rank, neither for julia-provided MPI nor for system MPI. For both variants there are 26 calls of p4est_iterate in total whereas the number should be a multiple of 3. However the simulation just continues with julia-provided MPI but hangs with system MPI.

sloede commented 3 months ago

This becomes more and more baffling... Alas, the joys of MPI-based parallelism and debugging 😱

What happens if you put an MPI_Barrier after the if n_boundaries > 0 ... end check, then print something from all ranks, then again an MPI_Barrier. Will all ranks reach that statement?

I do not see how a simulation might continue if a collective communication is not issued from all ranks, unless it is mixed with later calls to collective calls that somehow "match" with one MPI implementation and do not match with others.

Incidentally, are system MPI and Julia MPI the same MPI implementations? If not, can you check what happens if you use the same? That is, maybe it's not a Julia thing but really an MPI thing...

benegee commented 3 months ago

Thanks for the suggestions! Here is a small update:

When I put MPI_Barriers after if n_boundaries > 0 ... end they are passed by all ranks. I see the corresponding printlns of all ranks and the program does not hang.

When I put the barriers within if n_boundaries > 0 ... end only two ranks show the corresponding messages and the program does hang, also with julia MPI.

I also tried to change the MPI implementation. MPItrampoline_jll showed the same behavior. Unfortunately I could not make OpenMPI_jll work so far. It led to errors while precompiling P4est.jll and T8code.jll.

trixi-framework / Trixi.jl

P4est MPI simulation gets stuck when not all ranks have BC faces #1878