Introduce prepare for eval, fix evaluation bug

Description

This pull request introduces a prepare_for_eval function and updates the code to support it.

The implementation follows the blueprint of @fsschneider in https://github.com/mlcommons/algorithmic-efficiency/issues/719#issuecomment-2328797610 and fixes the bug of giving a free evaluation to a submission that goes out of max_runtime (again https://github.com/mlcommons/algorithmic-efficiency/issues/719#issuecomment-2328797610).

Function signature

The arguments of prepare_for_eval are the same as update_params, except for batch. I believe that prepare_for_eval should indeed be agnostic to the last batch used during training. The return type is the same as update_params.

def prepare_for_eval(workload: spec.Workload,
                     current_param_container: spec.ParameterContainer,
                     current_params_types: spec.ParameterTypeTree,
                     model_state: spec.ModelAuxiliaryState,
                     hyperparameters: spec.Hyperparameters,
                     loss_type: spec.LossType,
                     optimizer_state: spec.OptimizerState,
                     eval_results: List[Tuple[int, float]],
                     global_step: int,
                     rng: spec.RandomState) -> spec.UpdateReturn:
  return (optimizer_state, current_param_container, model_state)

List of changes

In submission_runner.py:

add timed call to prepare_for_eval
add profiler
move del batch before prepare_for_eval (instead than before evaluation)
update accumulated_submission_time after prepare_for_eval
compute is_time_remaining after prepare_for_eval
proceed to eval iff is_time_remaining
add prep_eval_rng

Minor changes:

add PrepareForEvalFn to spec
add prepare_for_eval to submission template
add prepare_for_eval to all pytorch and jax submissions
update the docs

Fixes #719 and #758 .

mlcommons / algorithmic-efficiency

Introduce prepare for eval, fix evaluation bug #789

Description

Function signature

List of changes