Variational_Information_DIstillation
Project of Reproducing "VID" involved in https://github.com/rp12-study/rp12-hub
Abstract
Requirements
- python==3.x
- tensorflow>=1.13.0
- Scipy
How to run
Note that
I found the author's code at https://github.com/ssahn0215/variational-information-distillation. However I'll not refer it, cause I want to check reproducibility of the paper. I don't know why but the author deleted his repository.
- My experimental results are higher than the paper. I found that It is tough to make such a low performance like paper. For this, I removed gamma and regularization of batch normalization, and modify hyper-parameters to make training unstable.
- The authors said "We choose four pairs of intermediate layers similarly to [31], each of which is located at the end of a group of residual blocks." but there are only three groups of residual blocks in WResNet. So I sense one more feature map after the first convolutional layer.
- I'll not follow the author's configuration for comparative methods. Because their modification look somewhat awkward, unfair and not coinside with the proposed ways. Also, I think that for fair comparison should not modify the original author configutation whether good or not. It means that I'll only reprocude the author's method, VID.
Experiment results
| Full Dataset | 20% Dataset | 10% Dataset | 2% Dataset |
Methods | Last Accuracy | Paper Accuracy | Last Accuracy | Paper Accuracy | Last Accuracy | Paper Accuracy | Last Accuracy | Paper Accuracy |
Student |
91.22 | 90.72 | 84.85 | 84.67 | 80.29 | 79.63 | 58.11 | 58.84 |
Teacher |
94.98 | 94.26 | - | - | - | - | - | - |
KD |
90.60 | 91.27 | 84.13 | 86.11 | 78.57 | 82.23 | 59.63 | 64.24 |
FitNet |
91.61 | 90.64 | 86.24 | 84.78 | 82.74 | 80.73 | 56.69 | 68.90 |
AT |
91.85 | 91.60 | 87.60 | 87.26 | 84.70 | 84.94 | 74.57 | 73.40 |
VID |
| 91.85 | | 89.73 | | 88.09 | | 81.59 |
Experimental results of full dataset
TO DO
- Check correctness of VID implementation and do experiments
- edit README