ruizhoud / DistributionLoss

Source code for paper "Regularizing Activation Distribution for Training Binarized Deep Networks"
31 stars 6 forks source link

First activation function without quantization? #5

Open Hyungjun-K1m opened 5 years ago

Hyungjun-K1m commented 5 years ago

Hi Ruizhou,

Thanks for sharing your code!

While going through your code, I found you used LeakyReLU for the first activation function and didn't quantize its output. Therefore it seems the second convolution layer takes full precision input instead of binary input. Previous works (i.e. XNOR-Net, DoReFa-Net) quantize the first activation as well.

Have you tried to quantize the first activation layer too?

ruizhoud commented 5 years ago

Hi Hyungjun,

Thanks for pointing this out. I just tried changing the first layer to:

    self.features0 = nn.Sequential(
        nn.Conv2d(self.channels[0], self.channels[1], kernel_size=11, stride=4, padding=2),
        nn.MaxPool2d(kernel_size=3, stride=2),
        nn.LeakyReLU(inplace=True),
        nn.BatchNorm2d(self.channels[1]),
    )
    self.features1 = nn.Sequential(
        self.activation_func(),
        BinarizeConv2d(self.channels[1], self.channels[2], kernel_size=5, padding=2),
        nn.MaxPool2d(kernel_size=3, stride=2),
        nn.BatchNorm2d(self.channels[2]),
    )

and inserting a distribution loss layer between self.feature0 and self.feature1 (right before the self.activation_func() in self.feature1). Without changing the hyper-parameters, I can get 47.2% top-1 accuracy.

Thanks, Ruizhou

Hyungjun-K1m commented 5 years ago

Thanks a lot!