Closed huttered40 closed 5 years ago
Fixing issue 1 (above) is easy: the ppn and tpr count will exist in the string that is fed into MPI_Init
via argv
. We can just print out a column for each underscore-separated entry in the string.
Issue 2 (above) is more concerning, particularly because I could only get an average count (presumably over n iterations), and the user would not be allowed to call any other MPI routine (because it would get intercepted and contribute erroneously to the overall counts).
MPI_Finalize
, then print to file only then, and with a known number of iterations (must be specified as part of the inputs; I think that is fair), I can divide out the saved data by that much to separate out the metrics into logical bins corresponding to each iteration.
The user would then be forced not to call any other MPI routine in the test.cpp such as MPI_Barrier
or MPI_Allreduce
to get max entry, etc.For issue 3 (above), it seems like the user would just have to specify this on his/her own with a CRITTER_OFF
flag.
#ifdef CRITTER_FLAG
)?CRITTER_FLAG
value in the middle of the job?For issue 4 (above), opening files should be done in MPI_Init
(afterward PMPI_Init
, so that we get to branch on rank 0). Closing files should be done in MPI_Finalize
We can provide a function to print out to file correctly, with only just a single call, where these would pass in something like perf, residual, etc. However, this is complicated by the fact that we don't want to use PMPI calls for this. Separate binaries is one solution, each linked with a different version of critter (ones with different values for -D Critter flag). This seems excessive though. But perhaps its the best way? Remember, the whole motivation for doing this in one swing was to avoid launching critter twice.
Issue 5 (above) can be fixed at the ScaPlot level as we are rewriting files! Just save a list of the MPI calls for a single test, and then print them all out!
Of the above issues, (3) is the most problematic.
For now, I will just not worry about non-critter data.
Another thing that worries me is if we print to and close out the output files via intercepting MPI_Finalize
, then the ComputationTimer might be completely screwed up.
Literally no extra (non-MPI) statements can go into the user's benchmark file or else that will count towards the computation timer. (mentioned in #1). By the same token, MPI_Init
will initialize the counters, and any work done between that invocation and the first iteration of the algorithm itself will count (motivation to increase the iteration count, as the first iteration will have overhead in the computation timer, depending on how the user writes his benchmark file).
Note that all measured communication time is stored in MPI routine-specific buckets anyways, so communication time does not suffer the same issue here as computation time.
Decent solution: We need to introduce an array for storing the computation time registered between any two collectives. Then, we capture the final time in MPI_Finalize
and divide out the array by the number of iterations to get the computation time per iteration, that gets written to the file.
We would have to pick this number out of the string generated in critter::bench
based on the parameters specified in the instructions.py file, but in order to do this, critter needs to know which position the parameter exists. Lets specify that numIterations is the last parameter.
For (3), user will have option to call critter::print(...)
using an extra array for non-critter data. This will not be necessary if only critter data is wanted.
For (3), I think I will want to configure twice, once with the CRITTER_FLAG
turned off, and the other turned on (if they specify that they want non-critter output.''
This motivates calling the configure
and build
inside the instructions.py
.
After discussion, critter::start
and critter::stop
between iterations is ok. This removes much of the effort necessary for allowing no source code modification.
Another question:
critter
to output to standard out (which will be that .out file).Ready to start testing on Porter.
Another question:
- How to handle cases where user just wants to run without using critter's generation code? There won't be that string parameter as the last argument. In this case, we need
critter
to output to standard out (which will be that .out file).
Here, we might want to simply have the critter python code set an environment variable and then in MPI_Init
, read it in via std::get_env
. If the environment variable was set, then use the argv string. If not, then simply write to standard out and set cout
as the Stream
.
I have a version working. Note that it assumes that user arguments have no '+' characters, as that is how critter/bench.py
separates the arguments so that its easy to parse in src/critter.cxx
.
Why does it seem like the environment variable being set in critter/bench.py
is not setting? It seems to be going away, or just not being set at all.
There is another large bug: critter::flag
should not determine whether or not to track the critter call in src/critter.h
as its currently doing. That capability was for using critter infrastructure to get raw output. These are separate things and this needs to be changed.
There is another large bug:
critter::flag
should not determine whether or not to track the critter call insrc/critter.h
as its currently doing. That capability was for using critter infrastructure to get raw output. These are separate things and this needs to be changed.
Rethink whether or not I even need raw performance or other metrics without critter, especially now that critter will be less overhead.
Also, consider basing whether or not to track critter based on a global bool that is set in critter::start
and cleared in critter::stop
, so outside of a specified part of code, critter is not doing anything.
Or consider writing everything, even residual for example, to one single file. That might simply things on critter's side, but what about on scaplot's side.
Next to-do before next round of testing:
Next to-do before next round of testing:
- [ ] Print to standard output nicer than I print to file (since human will be trying to parse the former)
- [x] Change the FileList in the python code, since now we are dealing with just a single file, some of it can be simplified.
Well actually, this case the size of the Input array will be 0, so there will be no input parameters to run up the number of columns. The first section for any variant should be the 6 total critical path metrics. After that on newlines, we can write out the tracked-routine-specific critical path metrics, where the rows correspond to the tracked routines, and the columns represent the 7 metrics.
Everything has been addressed on the critter side.
We want to get rid of everything that is currently necessary to get scaplot to run, including initialization and printing.
Various approaches don't work or are clumsy:
critter::init
call.Idea: Build the string using argc,argv via intercepting the MPI_Init call! This way, we can build the string inside MPI_Init, and then save it as a global variable. Note that binaries get copied to the scheduler, so there is no concern with overwriting global variables of a single binary that all tests launched across all batch scripts share.
In this fashion, the user doesn't have to specify a critter::init, nor provide a bunch of arguments into
critter::print
.Also, because we build the prefixed-columns (i.e. "c=256") now in Scaplot, Critter no longer needs the names of each input variable that parameterizes the variant (this is good, not a critter-side thing anyway).
However, a few issues remain:
critter::reset
andcritter::print
)?