ucbrise / actnn

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
MIT License
196 stars 30 forks source link

[Bugfix] Fix QDropout #20

Closed cenyk1230 closed 3 years ago

cenyk1230 commented 3 years ago

Hi, I find some bugs of my early implementation when using QDropout.

  1. In the backward, the gradient should also be divided by the 1-p factor.
  2. In the validation step (self.training = False), we can directly use the forward of nn.Dropout since dropout performs different in training and validation steps.

Please help check the modification.