Volume measures are largely useless in my opinion, and determining the average/volume spent in each symbol is difficult because it requires an expensive Gather operation to be correct in all possible cases, as each process could have a unique symbol not seen by the others (very unlikely, but possible).
In such a case, I will just not print it out.
We could issue an MPI_Allreduce with MPI_Op=MPI_MAX and another one with MPI_Op=MPI_MIN, followed by one more MPI_Allreduce with MPI_Op=MPI_SUM. Before the final collective is issued, each process records an error bit if the results of the reductions don't exactly match its number of symbols. No, this won't work! What if each process has k symbols, but each is different, resulting in k*p total symbols (admittedly highly unlikely, but I don't want to be responsible for faulty analysis).
A different approach then would be to have each process sort the symbol string (so that the ordering is clear), then reduce the total size of the symbol string so each process knows how much memory to post (although it would have to match that of everyone else to be exact).Still, even if the lengths match up, there is no way to know that the symbols are the same without explicit verification requiring some Gather operation onto the root process.
Volume measures are largely useless in my opinion, and determining the average/volume spent in each symbol is difficult because it requires an expensive Gather operation to be correct in all possible cases, as each process could have a unique symbol not seen by the others (very unlikely, but possible).
In such a case, I will just not print it out.
We could issue an MPI_Allreduce with MPI_Op=MPI_MAX and another one with MPI_Op=MPI_MIN, followed by one more MPI_Allreduce with MPI_Op=MPI_SUM. Before the final collective is issued, each process records an error bit if the results of the reductions don't exactly match its number of symbols. No, this won't work! What if each process has k symbols, but each is different, resulting in k*p total symbols (admittedly highly unlikely, but I don't want to be responsible for faulty analysis).
A different approach then would be to have each process sort the symbol string (so that the ordering is clear), then reduce the total size of the symbol string so each process knows how much memory to post (although it would have to match that of everyone else to be exact).Still, even if the lengths match up, there is no way to know that the symbols are the same without explicit verification requiring some Gather operation onto the root process.