nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.
Apache License 2.0
148 stars 74 forks source link

[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

Open DonghakPark opened 1 month ago

DonghakPark commented 1 month ago

Currently, mixed precision training is implemented in NNTrainer, but gradient clipping considering loss scale has not been implemented yet.

In Torch's example, it is implemented as follows, and there is a need to implement this in NNTrainer too.

scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()
        with autocast(device_type='cuda', dtype=torch.float16):
            output = model(input)
            loss = loss_fn(output, target)
        scaler.scale(loss).backward()

        # Unscales the gradients of optimizer's assigned params in-place
        scaler.unscale_(optimizer)

        # Since the gradients of optimizer's assigned params are unscaled, clips as usual:
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

        # optimizer's gradients are already unscaled, so scaler.step does not unscale them,
        # although it still skips optimizer.step() if the gradients contain infs or NaNs.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()
taos-ci commented 1 month ago

:octocat: cibot: Thank you for posting issue #2746. The person in charge will reply soon.

DonghakPark commented 1 month ago

Training Sequence

  1. Make an FP16 copy of the weights :
  2. Forward propagate using FP16 weights and activations
  3. Multiply the Resulting loss by the scale factor
  4. Backward propagate using FP16 weights, activations, and gradients
  5. Multiply the weight gradients by 1/sacle_factor
  6. Option process (gradient clipping, weight decay)
  7. Update the master copy of weights in FP32