Reading vgg.py, I'm confused about the last layers of the network.
My understanding is that VGG normally ends with a few fully connected layers and a softmax, but here we end with:
layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
and then
self.classifier = nn.Linear(512, 10)
My questions are:
1) What's the deal with AvgPool2d with kernel size 1? Seems like it should be a no-op?
2) Why were the fully connected layers removed? Is this an adjustment for the easier task of CIFAR-10, as compared to ImageNet?
Reading vgg.py, I'm confused about the last layers of the network.
My understanding is that VGG normally ends with a few fully connected layers and a softmax, but here we end with:
layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
and thenself.classifier = nn.Linear(512, 10)
My questions are: 1) What's the deal with AvgPool2d with kernel size 1? Seems like it should be a no-op? 2) Why were the fully connected layers removed? Is this an adjustment for the easier task of CIFAR-10, as compared to ImageNet?