Thanks for sharing your code.
I'm very impressed of your excellent work because it makes network tolerant of various types of optimizers and values of hyperparameters. I have a question about after the last layer whether there is Distrloss_layer or not.
In Cifar10 experiment, you use 7 Convolution layers(written as xC-xC-MP-2xC-2xC-MP-4xC-4xC-10C-GP) and I think the first layer is full precision convolution, whereas, the others are all binary convolutions including the last layer (10C). Is there a Distrloss_layer after the last binary convolution layer? I assume 10C-GP part as 10C-BN-Distrloss_layer-GP.
Hi Ruizhou,
Thanks for sharing your code. I'm very impressed of your excellent work because it makes network tolerant of various types of optimizers and values of hyperparameters. I have a question about after the last layer whether there is Distrloss_layer or not. In Cifar10 experiment, you use 7 Convolution layers(written as xC-xC-MP-2xC-2xC-MP-4xC-4xC-10C-GP) and I think the first layer is full precision convolution, whereas, the others are all binary convolutions including the last layer (10C). Is there a Distrloss_layer after the last binary convolution layer? I assume 10C-GP part as 10C-BN-Distrloss_layer-GP.