starrytong / SCNet

MIT License
59 stars 5 forks source link

Mixed precision error #7

Closed lyndonlauder closed 4 months ago

lyndonlauder commented 4 months ago

Hello when i try to train with FP16 in the accelerate config I receive this error

  File "/SCNet/scnet/solver.py", line 244, in _run_one_epoch
    self.optimizer.step()
  File "/usr/local/lib/python3.10/dist-packages/accelerate/optimizer.py", line 157, in step
    self.scaler.step(self.optimizer, closure)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 320, in step
    self._check_scale_growth_tracker("step")
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 142, in _check_scale_growth_tracker
    assert self._scale is not None, "Attempted {} but _scale is None.  ".format(funcname) + fix
AssertionError: Attempted step but _scale is None.  This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.
Traceback (most recent call last):

Do you have any advice for this error please?

starrytong commented 4 months ago

I added a new branch for FP16 training.