Open Hyungjun-K1m opened 5 years ago
Hi Hyungjun,
Thanks for pointing this out. I just tried changing the first layer to:
self.features0 = nn.Sequential(
nn.Conv2d(self.channels[0], self.channels[1], kernel_size=11, stride=4, padding=2),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.LeakyReLU(inplace=True),
nn.BatchNorm2d(self.channels[1]),
)
self.features1 = nn.Sequential(
self.activation_func(),
BinarizeConv2d(self.channels[1], self.channels[2], kernel_size=5, padding=2),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(self.channels[2]),
)
and inserting a distribution loss layer between self.feature0 and self.feature1 (right before the self.activation_func() in self.feature1). Without changing the hyper-parameters, I can get 47.2% top-1 accuracy.
Thanks, Ruizhou
Thanks a lot!
Hi Ruizhou,
Thanks for sharing your code!
While going through your code, I found you used LeakyReLU for the first activation function and didn't quantize its output. Therefore it seems the second convolution layer takes full precision input instead of binary input. Previous works (i.e. XNOR-Net, DoReFa-Net) quantize the first activation as well.
Have you tried to quantize the first activation layer too?