lanl / Pavilion

HPC testing harness
BSD 3-Clause "New" or "Revised" License
16 stars 12 forks source link

2.0 - Scheduler Plugins #64

Open pflarr opened 6 years ago

pflarr commented 6 years ago

Scheduler Plugins

The process of running and scheduling jobs is as follows. Steps that actually involve the scheduler are in bold:

  1. Test configs are resolved into almost final individual test configurations.
  2. Tests are grouped by scheduler.
  3. The scheduler relevant section of each test config has its variables resolved.
  4. The scheduler relevent test sections are given to the scheduler, which returns a set of minimum resource requirements across all tests and the 'sched' variable values.
  5. The 'sched' variables are used to completely resolve the variables in each test config.
  6. A 'PavTest' object is created for each test, and the test directories (and ID's) are created.
  7. The scheduler is given the requirements set and the list of tests to schedule.
  8. The scheduler writes a kickoff script that will individually run each test via pavilion.
  9. The scheduler schedules the kickoff script.
  10. The scheduler verifies that the kickoff script should actually run given the resources still available.
  11. Pavilion gives the user the list of test ID's and confirms that the tests have been kicked off.

For schedulers that actually schedule jobs on a cluster, the kickoff script is expected to run on an allocation sized to the largest test it expects to run. The tests themselves should run on pieces of that allocation scheduled within itself. This may not be possible on all schedulers, but is for slurm (and probably Moab). The kickoff script does the following:

  1. Sets up the environment necessary to run pavilion.
  2. Loops through each test serially: a. Issues the pav do_build command for the test. b. Issues the pav do_run command to run the test.
  3. Updates the test status (using the pav status command) before and after each step.

The pav do_run command does the following.

  1. Finds the test based on test id.
  2. Writes the test run script.
  3. Schedules and runs the test run script.
nicholas-sly commented 5 years ago

Looking at 3 main points of interaction with the scheduler plugins.

  1. After having collected the inputs and populated all of the other variables, provide a desired partition, state, minimum number of nodes, max number of nodes, min number of processors per node, maximum number of processors per node, and whether the job needs to run immediately or if it can wait. An exception will be thrown if not enough nodes are available. The maximum number of nodes requested can be 'all'. The checks are performed by collecting the data provided by scontrol show node for all nodes and iterating through that information. If all of the checks are passed, a tuple of number of nodes and the number of processors per node that should be used in the batch script.

  2. When the main program has decided what it wants to request as resources, the partition, reservation, qos, account, number of nodes, processors per node, and time limit should be passed to this function and it will return a list of strings that should follow immediately after the she-bang to specify the resources. The main program can then use that to compose the submission script.

  3. Finally, when the main program has written the script and is ready to submit it, the class can return the submission invocation call (e.g. - sbatch for slurm). The call is returned as a single string to put in the subprocess call.

These are the main parts that the scheduler is differentiated by and therefore responsible for. If further functions are required for querying the queue and status of the job, these should be fleshed out.

nicholas-sly commented 5 years ago

Another commit has changed this slightly.

*3. The scheduler class and subclasses now have a 'submit_job' function that takes a path to the submission script and submits the job to the scheduler. It also returns the job ID.

  1. The scheduler class and subclasses now have a 'check_job' function that takes a job ID and an optional key. Without the key, the function will return a generic state of the job by interpreting the states returned by the scheduler. The generic states are 'pending', 'running', 'finished', and 'failed'. The rest of the characteristics are accessible generally by providing a key. These are as-yet, uninterpreted.