Inform submission about evaluation step

Niccolo-Ajroldi commented 5 months ago

tl;dr: We should let the submission know if an evaluation is going to happen at the current step or not.

Description

Currently, there is no easy way for the submission to know if the model returned by update_params is going to be evaluated on the train/valid/test set.

This limits the space of possible submission, or at least force them to apply some workarounds to infer whether the current step is going to be an evaluation step or not. (A possible workaround is to keep track of time from last evaluation inside the submission, but this adds a non-negligible overhead to the submission itself and deviates from the original goal of the submission.).

An example of a submission where it is crucial to know when evaluation is going to occur is Stochastic Weight Average.

Possible solutions

A straightforward solution is to decide if the submission is eligible for an untimed eval before calling update_params, and add an argument to update_params that passes this information to the submission.

The only drawback with this approach is that we don't evaluate every workload.eval_period_time_sec, but a little less frequently (we evaluate every workload.eval_period_time_sec + _submission_time_perstep). Assuming that workload.eval_period_time_sec >> _submission_time_perstep, this is hopefully not a big difference.

I think this is an important feature, and it would be nice to implement it.

priyakasimbeg commented 5 months ago

Hi Niccolo, Can you explain a little more why this feature is required for Stochastic Weight Average? If it is more efficient you can also join the Thursday WG meeting.

Niccolo-Ajroldi commented 5 months ago

Hi @priyakasimbeg, thanks for looking into this! I'll explain better.

In classical weight averaging (Polyak Averaging, SWA, LAWA...), we collect a copy of model_parameters once in a while, aggregate those checkpoints, and use them for evaluation only, while keep progressing with training.

Under the current code infrastructure, we do not know when an eval is going to occur, so, if we want to implement weight averaging, we need to always return the average in update_params.

This is problematic because we also have to keep in memory the current model state to resume training from it at the next iteration, requiring a deepcopy of the model + storing it to cpu at each iteration (since CUDA memory is limited), hence introducing a big overhead to this kind of submission.

Pseudocode of weight avg under current codebase:

def update_params(current_model, ...):

  # load previous model (current_model contains avg_model, discard it)
  current_model.load_state_dict(prev_model)

  loss = ...
  loss.backward()
  optim.step()

  # save prev model
  prev_model = deepcopy(current_model)

  # update weight avg
  swa.update(current_model)

  # get avg
  avg_model = swa.get_avg()

  return avg_model

My suggestion is to decide weather an evaluation is gonna occur before calling update_params, and passing this information to `update_params'.

This expands the actions that a submission is allowed to do, enlarging the space of possible submissions, and allowing for a situation that is common in practice (in a common ML task, we do know when eval is going to occur, and we can exploit this information).

Niccolo-Ajroldi commented 5 months ago

Notice that the pull request modifies current submissions adding the extra argument is_eval_step, but this is not really necessary.

We can make the code backward compatible by putting the call to update_params in a try-except block, allowing for both type of submissions.

fsschneider commented 5 months ago

Hi Niccolo,

we believe that figuring out before the call of update_params whether an eval is coming up is a bit dangerous. E.g. a submission could then in the last step take an arbitrarily long time in this last step and not run out of the submission budget. Since the time spent within the update_params step can vary wildly between submissions, we don't think this is a viable strategy.

But since you are free to "re-interpret" what a step means. You could also make your step longer (e.g. by just doing 10 iterations of updating the parameters as an inner-loop). In that way, you can reduce some of the model-transfer costs, right?

We are happy to discuss this topic more in our WG meeting. Could you join this week Thursday (19:35 – 20:30 German time)?

Niccolo-Ajroldi commented 5 months ago

Sure, happy to join the WG meeting on Thursday.

Making a step longer is indeed a good solution, thanks!

Regarding the problem of a too long final step, we could solve it by checking again after update_params if there is time remaining, and proceed to eval only in that case. The only drawback is that we might inform the submission of an imminent eval step that is not gonna occur (this would happen one time at most). Something like:

if time - last_eval_time < eval_sec:
  is_eval_step = True

update_params(...)

if submission_time < max_runtime:
 eval(...)

priyakasimbeg commented 5 months ago

From offline WG group discussion we have decided to leave it as is for now, but in future iterations, we could have a prepare_for_eval function or similar to account for this. Reordering the check for whether an eval is due is difficult (before the update_params and let the submission know if an eval is coming up). The benchmark code doesn’t “know” how long the update_params function will take. A submission might have exceeded the wall-clock budget.

Some suggestions for workarounds:

design your step to contain, e.g. 20 steps, to ammortize the costs?
keep the copy in device memory. Note this may be difficult for Criteo due to the embedding tables (could not do the moving averaging for layers of this type)

fsschneider commented 5 days ago

We plan to discuss feature requests like these in the benchmark code during the WG meeting on Thursday, 9/5.

fsschneider commented 5 days ago

While thinking about this issue and the possible changes it required, I created this skeleton version of our current submission_runner's logic. This is a simplified version of the current version:

# Bookkeeping Train State
train_state['is_time_remaining'] = True
train_state['validation_goal_reached'], train_state['test_goal_reached'] = False, False
train_state['last_eval_time'], train_state['accumulated_submission_time'] = 0, 0
train_state['training_complete'] = False  # Can be set to true by the submission via the spec.TrainingCompleteError
goals_reached = (
      train_state['validation_goal_reached'] and
      train_state['test_goal_reached'])

# [...]

# Training loop
# Only start training if time remaining and training is not complete
while train_state['is_time_remaining'] and \
      not goals_reached and \
      not train_state['training_complete']:

  # [...]

  update_params()

  # [...]

  # Update submission time and compute (but not check) if time is remaining
  train_state['accumulated_submission_time'] += # [...]
  train_state['is_time_remaining'] = train_state['accumulated_submission_time'] < max_allowed_runtime_sec

  # [...]

  # Check if the submission is eligible for an untimed eval.
  if (time_since_last_eval >= workload.eval_period_time_sec or train_state['training_complete']):
    eval_model()
    # Check if targets are reached.
    train_state["validation_goal_reached"], train_state["test_goal_reached"] = # [...]

I believe the issue is that we don't check whether there is still time left directly before doing the eval. @priyakasimbeg, this is the bug I mentioned in our call yesterday. We currently only check for the time remaining at the beginning of the while loop, then do the submission's update_params(), then recompute the is_time_remaining, but we don't check if this is true before the eval. To illustrate what I mean, imagine an update_params() function that just takes incredibly long, e.g. doing all the update steps it wants. With our current code, we would set is_time_remaining to True before training, start the while loop, and run update_params() (which would take forever). After that, we check is_time_remaining and would find out that it is False. However, we would still do an eval even though the submission has run way over the budget.

To fix this, we can perform the following modifications. This would also rather easily allow a submission's prepare_for_eval() function to be included:

# Training loop
# Only start training if time remaining and training is not complete
while train_state['is_time_remaining'] and \
      not goals_reached and \
      not train_state['training_complete']:

  # [...]

  update_params()

  # [...]

  # Check if the submission is eligible for an untimed eval.
  if (time_since_last_eval >= workload.eval_period_time_sec or train_state['training_complete']):
    prepare_for_eval()

    # Update submission time and compute if time is remaining
    train_state['accumulated_submission_time'] += # [...]
    train_state['is_time_remaining'] = train_state['accumulated_submission_time'] < max_allowed_runtime_sec

    # Only eval if time is remaining
    if train_state['is_time_remaining']:
      eval_model()
      # Check if targets are reached.
      train_state["validation_goal_reached"], train_state["test_goal_reached"] = # [...]

I guess this logic is needed irrespective of whether we allow a prepare_for_eval() function or not, and so the code doesn't get much more complicated by allowing such a function. Depending on what exactly we want to exclude from the accumulated_submission_time we would need some extra statements and checks, but I think all of it should be doable.

mlcommons / algorithmic-efficiency

Inform submission about evaluation step #719

Description

Possible solutions