Open benegee opened 3 months ago
This would explain the blocking. What I do not understand is why it works when not using system MPI.
I share your conclusion. It seems to be a good explanation for the blocking, but does not explain why it works with the out-of-the-box MPI. Maybe you can verify (using a "wololo" statement or just plain output to stderr) that indeed the init_boundaries!
function is not called on the middle rank for the Julia-provided MPI but gets called by the system MPI?
I double-checked this: init_boundaries!
is not called on the middle rank, neither for julia-provided MPI nor for system MPI. For both variants there are 26 calls of p4est_iterate
in total whereas the number should be a multiple of 3. However the simulation just continues with julia-provided MPI but hangs with system MPI.
This becomes more and more baffling... Alas, the joys of MPI-based parallelism and debugging 😱
What happens if you put an MPI_Barrier
after the if n_boundaries > 0 ... end
check, then print something from all ranks, then again an MPI_Barrier
. Will all ranks reach that statement?
I do not see how a simulation might continue if a collective communication is not issued from all ranks, unless it is mixed with later calls to collective calls that somehow "match" with one MPI implementation and do not match with others.
Incidentally, are system MPI and Julia MPI the same MPI implementations? If not, can you check what happens if you use the same? That is, maybe it's not a Julia thing but really an MPI thing...
Thanks for the suggestions! Here is a small update:
When I put MPI_Barriers
after if n_boundaries > 0 ... end
they are passed by all ranks. I see the corresponding println
s of all ranks and the program does not hang.
When I put the barriers within if n_boundaries > 0 ... end
only two ranks show the corresponding messages and the program does hang, also with julia MPI.
I also tried to change the MPI implementation. MPItrampoline_jll showed the same behavior. Unfortunately I could not make OpenMPI_jll work so far. It led to errors while precompiling P4est.jll and T8code.jll.
I first encountered this issue for the baroclinic instability test case. Then I found a reduced example based on elixir_advection_basic but with nonperiodic boundary conditions:
Running this with system MPI and on 3 ranks makes the simulation hang. Running with
tmpi
and looking at the backtraces when aborting shows that two ranks have calledinit_boundaries
and eventuallyp4est_iterate
, while the third (the middle one) is already somewhere inrhs!
.It seems to be caused by the check https://github.com/trixi-framework/Trixi.jl/blob/2dfde7faf3cc74f066d86148ae6c99ed9e58fa79/src/solvers/dgsem_p4est/containers.jl#L266-L268 Consequently only two ranks eventually call
p4est_iterate
. On the p4est sidep4est_iterate
first callsp4est_is_valid
, which callsMPI_Allreduce
.This would explain the blocking. What I do not understand is why it works when not using system MPI.
MWE
```julia using OrdinaryDiffEq using Trixi ############################################################################### # semidiscretization of the linear advection-diffusion equation advection_velocity = (1.0, 0.0) equations = LinearScalarAdvectionEquation2D(advection_velocity) initial_condition = initial_condition_gauss boundary_conditions = Dict(:y_neg => BoundaryConditionDirichlet(initial_condition), :y_pos => BoundaryConditionDirichlet(initial_condition)) # Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs) coordinates_min = (-5.0, -5.0) coordinates_max = (5.0, 5.0) trees_per_dimension = (1, 3) mesh = P4estMesh(trees_per_dimension, polydeg = 3, coordinates_min = coordinates_min, coordinates_max = coordinates_max, initial_refinement_level = 0, periodicity = (true, false)) # A semidiscretization collects data structures and functions for the spatial discretization semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver, boundary_conditions = boundary_conditions) ############################################################################### # ODE solvers, callbacks etc. # Create ODE problem with time span `tspan` tspan = (0.0, 1.0) ode = semidiscretize(semi, tspan); # At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup # and resets the timers summary_callback = SummaryCallback() callbacks = CallbackSet(summary_callback) ############################################################################### # run the simulation # OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false), dt = 0.01, save_everystep = false, callback = callbacks); # Print the timer summary summary_callback() ```