szagoruyko / wide-residual-networks

3.8% and 18.3% on CIFAR-10 and CIFAR-100
http://arxiv.org/abs/1605.07146
BSD 2-Clause "Simplified" License
1.3k stars 293 forks source link

Why only 16 output channels in the first convolution? #49

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hi,

Is it possible I get a better accuracy in a wide resnet by using more output channels in the first convolution layer? Like using 64 or 128 as the other convolutions are getting.

thanks

szagoruyko commented 6 years ago

@zeno40 maybe - I haven't tried.

ghost commented 6 years ago

The reason I ask is because I trained a WRN-16-6 with 96 channels in the first convolution instead of 16 and trained it with the same training scheme using dropout of 0.3 and L2 penalty of 0.0005 with ReLU activation and local mean/std normalization and reached a test error of 4.21 Clearly not state-of-the-art but very close to some wider and deeper wide resnets.