Open jerett-cc opened 7 months ago
the executable high-order-euler
is build by entering the directory /ryujin
and typing make debug
, which contrasts how @bangerth build the debug executable on his machine.
this issue may be related to the number of time bricks. for example, if the spatial communicator has 5 processes and the global communicator only has 5 processes, but Num_Time = 4. Then we are asking each of the 4 bricks to use the same communicator.
I suspect that since these all use the same communicator, there are communication errors, since one brick might be working on some part, assuming that it owns all of the communicators, but that this is not the case and some other brick does the same and there is a communication clash. For proof of this hunch, consider that running with
mpirun -n 5 high-order-euler test.prm 5 1 3 5
with test.prm:
subsection App
set print_solution = false
set Time Bricks = 1
set Start Time = 0.0
set Stop Time = 5.0
set cfactor = 2 # 2 is Xbraid default
set max_iter = 1
end
subsection OfflineData
end
subsection TimeLoop
set basename = cylinder
set enable checkpointing = false
set enable compute error = false
set enable compute quantities = false
set enable output full = true
set enable output levelsets = false
set error normalize = false
set error quantities = rho, m_1, m_2, E
set output checkpoint multiplier = 1
set output full multiplier = 1
set output granularity = 1
set output levelsets multiplier = 1
set output quantities multiplier = 1
set refinement timepoints =
set resume = false
set terminal show rank throughput = false
set terminal update interval = 5
end
subsection Equation
set gamma = 1.4
set reference density = 1
set vacuum state relaxation = 10000
end
subsection Discretization
set geometry = cylinder
set mesh distortion = 0
set mesh repartitioning = false
end
subsection InitialValues
set configuration = uniform
set direction = 1, 0
set perturbation = 0
set position = 1, 0
subsection astro jet
set jet width = 0.05
set primitive ambient right = 5, 0, 0.4127
set primitive jet state = 5, 30, 0.4127
end
subsection uniform
set primitive state = 1.4, 3, 1
end
end
subsection HyperbolicModule
set cfl with boundary dofs = false
set limiter iterations = 2
set limiter newton max iterations = 2
set limiter newton tolerance = 1e-10
set limiter relaxation factor = 1
end
subsection TimeIntegrator
set cfl max = 0.9
set cfl min = 0.45
set cfl recovery strategy = bang bang control
set time stepping scheme = erk 33
end
subsection VTUOutput
set manifolds =
set schlieren beta = 10
set schlieren quantities = rho
set schlieren recompute bounds = true
set use mpi io = true
set vorticity quantities =
set vtu output quantities = rho, m_1, m_2, E
end
subsection Quantities
set boundary manifolds =
set clear statistics on writeout = true
set interior manifolds =
end
Does not have an MPI error. The only change on this run from the one in this issue which is broken is that in test.prm
Time Bricks = 1
Adding the MPI_Barrier(comm_x) does not work. in app we wrote
void prepare_mg_objects()
{
for(unsigned int lvl = 0; lvl < refinement_levels.size(); lvl++)
{
if (dealii::Utilities::MPI::this_mpi_process(comm_t) == 0)
{
std::cout << "[INFO] Preparing Structures in App at level "
<< refinement_levels[lvl] << std::endl;
}
levels[lvl]->prepare();
std::cout << "Level " + std::to_string(refinement_levels[lvl]) + " prepared." << std::endl;
MPI_Barrier(comm_x);
}
//set the last variables in app.
n_fine_dofs = levels[0]->offline_data->dof_handler().n_dofs();
n_locally_owned_dofs = levels[0]->offline_data->n_locally_owned();
}
The barrier seems to NOT solve the issue here.
Similarly,
MPI_Barrier(MPI_COMM_WORLD);
in the same spot does NOT fix the issue.
That's a bummer. I would leave it in anyway, though. It does not hurt, and might help.
NEED to test this with the current (Aug. 22nd 2024) state of the code.
I get the following output when I run testcases that we have already tested on @bangerth's machine.
which happens when run with
run with the following prm file: