Variational_Information_DIstillation

Project of Reproducing "VID" involved in https://github.com/rp12-study/rp12-hub

Abstract

~~I found the author's code at https://github.com/ssahn0215/variational-information-distillation. However I'll not refer it, cause I want to check reproducibility of the paper.~~ I don't know why but the author deleted his repository.
My experimental results are higher than the paper. I found that It is tough to make such a low performance like paper. For this, I removed gamma and regularization of batch normalization, and modify hyper-parameters to make training unstable.
The authors said "We choose four pairs of intermediate layers similarly to [31], each of which is located at the end of a group of residual blocks." but there are only three groups of residual blocks in WResNet. So I sense one more feature map after the first convolutional layer.
I'll not follow the author's configuration for comparative methods. Because their modification look somewhat awkward, unfair and not coinside with the proposed ways. Also, I think that for fair comparison should not modify the original author configutation whether good or not. It means that I'll only reprocude the author's method, VID.

	Full Dataset		20% Dataset		10% Dataset		2% Dataset
Methods	Last Accuracy	Paper Accuracy	Last Accuracy	Paper Accuracy	Last Accuracy	Paper Accuracy	Last Accuracy	Paper Accuracy
Student	91.22	90.72	84.85	84.67	80.29	79.63	58.11	58.84
Teacher	94.98	94.26	-	-	-	-	-	-
KD	90.60	91.27	84.13	86.11	78.57	82.23	59.63	64.24
FitNet	91.61	90.64	86.24	84.78	82.74	80.73	56.69	68.90
AT	91.85	91.60	87.60	87.26	84.70	84.94	74.57	73.40
VID		91.85		89.73		88.09		81.59

Experimental results of full dataset