Currently, after a BSP step, we iterate over all MPI routines tracked by critter and find the max over five different metrics. This results in factor of NUM_CRITTERS more synchronizations than necessary.
_critter::compute_max_crit(...) should simply fill in the local costs to a window of an array. At the end of that loop, we can perform a single MPI_Allreduce, and then write back each reduced entry to the member variables of the corresponding MPI routine.
Currently, after a BSP step, we iterate over all MPI routines tracked by critter and find the max over five different metrics. This results in factor of
NUM_CRITTERS
more synchronizations than necessary._critter::compute_max_crit(...)
should simply fill in the local costs to a window of an array. At the end of that loop, we can perform a singleMPI_Allreduce
, and then write back each reduced entry to the member variables of the corresponding MPI routine.