wlsh1up / Pytorch_DoReFaNet

0 stars 0 forks source link

Excellent work! #1

Open lirundong opened 5 years ago

lirundong commented 5 years ago

I do think this repository is an elegant and faithful reimplementation of the DoReFa paper. Belowing are some minor concerns to me:

  1. Please try to list some numeracal results and compare them with the origin paper, e.g. the accuracy of AlexNet on SVHN dataset (Table 1 in paper). This is meant to ensure the correctness of your implementation. You can utilize the torchvision SVHN dataset. I suggest that firstly train a FP32 baseline and compare it with the last row of Table 1, then go for W1/A2/G32 setting, finally go for W1/A2/G4 setting;
  2. There is a minor mistake in gradient quantizer implementation. Please note the equation 12 in origin paper, the absolute magnitude is taken from each instance within the minibatch:

    and the maximum is taken over all axis of the gradient tensor dr except for the mini-batch axis (therefore each instance in a mini-batch will have its own scaling factor)

    so you can pass dim=0 option to torch.max. Be aware of the return values when dim is assigned;

  3. Please try to use .gitignore to filter out the .DS_Store and __pycache__ files. They are not meant to tracked by git.

After all, very very excellent work. Once you done the numeracal camparison, we can have some conversations and go for broader directions.

lirundong commented 5 years ago

Updated review:

I compared this reimplementation with the official implementation, there are some differences:

  1. DigitNet_Q is not consistent with the origin paper: the building blocks should be in Conv -> BN -> [MaxPool] -> ReLU (clip 0, 1) -> QuantAct order as here, rather than Conv -> BN -> ReLU -> MaxPool -> Clip 0, 1 -> QuantAct in your implementation;
  2. The first Conv layer don't have following BN, as here;
  3. Your configuration of Adam optimizer is not consistent with the official one. You should not apply weight decay. Please refer to this note for more details;
  4. You can implement exponential LR decay by ExponentialLR. Please refer to TensorFlow exponential_decay doc for the underlaying scheme.

Please try to fix these issues and try again.