mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
335 stars 69 forks source link

Introduce prepare for eval, fix evaluation bug #789

Open Niccolo-Ajroldi opened 2 months ago

Niccolo-Ajroldi commented 2 months ago

Description

This pull request introduces a prepare_for_eval function and updates the code to support it.

The implementation follows the blueprint of @fsschneider in https://github.com/mlcommons/algorithmic-efficiency/issues/719#issuecomment-2328797610 and fixes the bug of giving a free evaluation to a submission that goes out of max_runtime (again https://github.com/mlcommons/algorithmic-efficiency/issues/719#issuecomment-2328797610).

Function signature

The arguments of prepare_for_eval are the same as update_params, except for batch. I believe that prepare_for_eval should indeed be agnostic to the last batch used during training. The return type is the same as update_params.

def prepare_for_eval(workload: spec.Workload,
                     current_param_container: spec.ParameterContainer,
                     current_params_types: spec.ParameterTypeTree,
                     model_state: spec.ModelAuxiliaryState,
                     hyperparameters: spec.Hyperparameters,
                     loss_type: spec.LossType,
                     optimizer_state: spec.OptimizerState,
                     eval_results: List[Tuple[int, float]],
                     global_step: int,
                     rng: spec.RandomState) -> spec.UpdateReturn:
  return (optimizer_state, current_param_container, model_state)

List of changes

In submission_runner.py:

Minor changes:

Fixes #719 and #758 .

github-actions[bot] commented 2 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅