Fix errors in critical path calculation

huttered40 commented 5 years ago

Although the critical path of individual routines are being tracked correctly, the routine-independent metrics (number of bytes, communication cost, estimated costs, etc) are not. This is because we are just adding up the costs of each collective and summing them.

We need to have each process contribute their routine-specific existing counts for each of these metrics and make that determine the current critical path, which is then propagated to the rest of the processes in that (sub)communicator.

[x] MPI_Comm_split is sketchy. We add to the total timers here (and those timers no longer truly exist).
[x] Average cost functions have not been updated in a while. Check on them.
[x] Check new overlap time calculation for errors

huttered40 commented 5 years ago

No longer tracking MPI_Comm_split. The communicator isn't even ready until after we perform the PMPI routine anyway.

So, this will now be part of computational overhead. May want to re-think this later.

huttered40 commented 5 years ago

Note that all the _critter::my_... members need to be updated.

For each critter output for the std::cout overload, we need to add the average metrics for each tracked routine.

huttered40 commented 5 years ago

Made major changes to tracking of per-process data. Needs to be checked for correctness.

huttered40 commented 5 years ago

Launched a cacqr2 job on Stampede2 that will help identify any obvious errors.

huttered40 commented 5 years ago

I have generated the plots. Need to inspect them to see if they make sense (both the critical paths, and the per-process paths).

huttered40 commented 5 years ago

I'm going to assume the overlap is correct for now, but will be performing much more rigorous tests in the future on real overlapping algorithms.

huttered40 / critter

Fix errors in critical path calculation #10