huttered40 / critter

Critical path analysis of MPI parallel programs
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Remove need for any changes to existing user source code #3

Closed huttered40 closed 5 years ago

huttered40 commented 5 years ago

We want to get rid of everything that is currently necessary to get scaplot to run, including initialization and printing.

Various approaches don't work or are clumsy:

  1. Export environment variables within the job scripts containing the file to write to. Will not work because different job scripts (with different node,ppn,tpr counts) might launched simultaneously and overwrite file names.
  2. Build the output string from the input parameters (which define the variant). The problem with this is it requires a critter::init call.

Idea: Build the string using argc,argv via intercepting the MPI_Init call! This way, we can build the string inside MPI_Init, and then save it as a global variable. Note that binaries get copied to the scheduler, so there is no concern with overwriting global variables of a single binary that all tests launched across all batch scripts share.

In this fashion, the user doesn't have to specify a critter::init, nor provide a bunch of arguments into critter::print.

Also, because we build the prefixed-columns (i.e. "c=256") now in Scaplot, Critter no longer needs the names of each input variable that parameterizes the variant (this is good, not a critter-side thing anyway).

However, a few issues remain:

  1. How do we get columns for ppn,tpr?
  2. How do we restart timers (Which we originally did with critter::reset and critter::print)?
  3. How do we run with both critter (and non-critter if the user wants to).
  4. How to open and close files?
  5. For tests with different MPI calls, its necessary for all to be tracked, even for variants that don't use them. Before, we tackled this by having the user specify a string tag that matched a map holding onto a user-specified list of collectives to track (on the c++ side).
huttered40 commented 5 years ago

Fixing issue 1 (above) is easy: the ppn and tpr count will exist in the string that is fed into MPI_Init via argv. We can just print out a column for each underscore-separated entry in the string.

huttered40 commented 5 years ago

Issue 2 (above) is more concerning, particularly because I could only get an average count (presumably over n iterations), and the user would not be allowed to call any other MPI routine (because it would get intercepted and contribute erroneously to the overall counts).

  1. One trick that could work here is to save the metrics for all MPI calls until MPI_Finalize, then print to file only then, and with a known number of iterations (must be specified as part of the inputs; I think that is fair), I can divide out the saved data by that much to separate out the metrics into logical bins corresponding to each iteration. The user would then be forced not to call any other MPI routine in the test.cpp such as MPI_Barrier or MPI_Allreduce to get max entry, etc.
huttered40 commented 5 years ago

For issue 3 (above), it seems like the user would just have to specify this on his/her own with a CRITTER_OFF flag.

huttered40 commented 5 years ago

For issue 4 (above), opening files should be done in MPI_Init (afterward PMPI_Init, so that we get to branch on rank 0). Closing files should be done in MPI_Finalize

We can provide a function to print out to file correctly, with only just a single call, where these would pass in something like perf, residual, etc. However, this is complicated by the fact that we don't want to use PMPI calls for this. Separate binaries is one solution, each linked with a different version of critter (ones with different values for -D Critter flag). This seems excessive though. But perhaps its the best way? Remember, the whole motivation for doing this in one swing was to avoid launching critter twice.

huttered40 commented 5 years ago

Issue 5 (above) can be fixed at the ScaPlot level as we are rewriting files! Just save a list of the MPI calls for a single test, and then print them all out!

huttered40 commented 5 years ago

Of the above issues, (3) is the most problematic.

For now, I will just not worry about non-critter data.

huttered40 commented 5 years ago

Another thing that worries me is if we print to and close out the output files via intercepting MPI_Finalize, then the ComputationTimer might be completely screwed up.

Literally no extra (non-MPI) statements can go into the user's benchmark file or else that will count towards the computation timer. (mentioned in #1). By the same token, MPI_Init will initialize the counters, and any work done between that invocation and the first iteration of the algorithm itself will count (motivation to increase the iteration count, as the first iteration will have overhead in the computation timer, depending on how the user writes his benchmark file).

Note that all measured communication time is stored in MPI routine-specific buckets anyways, so communication time does not suffer the same issue here as computation time.

huttered40 commented 5 years ago

Decent solution: We need to introduce an array for storing the computation time registered between any two collectives. Then, we capture the final time in MPI_Finalize and divide out the array by the number of iterations to get the computation time per iteration, that gets written to the file.

We would have to pick this number out of the string generated in critter::bench based on the parameters specified in the instructions.py file, but in order to do this, critter needs to know which position the parameter exists. Lets specify that numIterations is the last parameter.

huttered40 commented 5 years ago

For (3), user will have option to call critter::print(...) using an extra array for non-critter data. This will not be necessary if only critter data is wanted.

huttered40 commented 5 years ago

For (3), I think I will want to configure twice, once with the CRITTER_FLAG turned off, and the other turned on (if they specify that they want non-critter output.''

This motivates calling the configure and build inside the instructions.py.

huttered40 commented 5 years ago

After discussion, critter::start and critter::stop between iterations is ok. This removes much of the effort necessary for allowing no source code modification.

huttered40 commented 5 years ago

Another question:

huttered40 commented 5 years ago

Ready to start testing on Porter.

huttered40 commented 5 years ago

Another question:

  • How to handle cases where user just wants to run without using critter's generation code? There won't be that string parameter as the last argument. In this case, we need critter to output to standard out (which will be that .out file).

Here, we might want to simply have the critter python code set an environment variable and then in MPI_Init, read it in via std::get_env. If the environment variable was set, then use the argv string. If not, then simply write to standard out and set cout as the Stream.

huttered40 commented 5 years ago

I have a version working. Note that it assumes that user arguments have no '+' characters, as that is how critter/bench.py separates the arguments so that its easy to parse in src/critter.cxx.

huttered40 commented 5 years ago

Why does it seem like the environment variable being set in critter/bench.py is not setting? It seems to be going away, or just not being set at all.

huttered40 commented 5 years ago

There is another large bug: critter::flag should not determine whether or not to track the critter call in src/critter.h as its currently doing. That capability was for using critter infrastructure to get raw output. These are separate things and this needs to be changed.

huttered40 commented 5 years ago

There is another large bug: critter::flag should not determine whether or not to track the critter call in src/critter.h as its currently doing. That capability was for using critter infrastructure to get raw output. These are separate things and this needs to be changed.

Rethink whether or not I even need raw performance or other metrics without critter, especially now that critter will be less overhead.

Also, consider basing whether or not to track critter based on a global bool that is set in critter::start and cleared in critter::stop, so outside of a specified part of code, critter is not doing anything.

Or consider writing everything, even residual for example, to one single file. That might simply things on critter's side, but what about on scaplot's side.

huttered40 commented 5 years ago

Next to-do before next round of testing:

huttered40 commented 5 years ago

Next to-do before next round of testing:

  • [ ] Print to standard output nicer than I print to file (since human will be trying to parse the former)
  • [x] Change the FileList in the python code, since now we are dealing with just a single file, some of it can be simplified.

Well actually, this case the size of the Input array will be 0, so there will be no input parameters to run up the number of columns. The first section for any variant should be the 6 total critical path metrics. After that on newlines, we can write out the tracked-routine-specific critical path metrics, where the rows correspond to the tracked routines, and the columns represent the 7 metrics.

huttered40 commented 5 years ago

Everything has been addressed on the critter side.