fully connected initialization

szagoruyko / wide-residual-networks

3.8% and 18.3% on CIFAR-10 and CIFAR-100

http://arxiv.org/abs/1605.07146

BSD 2-Clause "Simplified" License

1.3k stars 293 forks source link

fully connected initialization #28

Open Andreyisakov opened 7 years ago

Andreyisakov commented 7 years ago

can you please explain why the fully connect layers weights are not initialize with MSRinit , how they are initialize ?

Cadene commented 7 years ago

which file? which line? :)

Andreyisakov commented 7 years ago

Hi, thanks for the response,

In wide-residual-networks/models/utils.lua the FCinit function, which is used in wide-resnet.lua and in vgg.lua,

Why the fully connected layers aren't initialized in the same manner as the convolutional layers? where in the code is the FC layers initialization ?

Thanks!

On Wed, Dec 7, 2016 at 7:38 PM, Remi notifications@github.com wrote:

which file? which line? :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/szagoruyko/wide-residual-networks/issues/28#issuecomment-265516150, or mute the thread https://github.com/notifications/unsubscribe-auth/AMoNPdWJWiYSbC4A45XQ4JLGBA4quMPWks5rFu8agaJpZM4LGqf7 .

Andrey Isakov

Cadene commented 7 years ago

FCinit and MSRinit applied to WideResNet FCinit and MSRinit code

I guess, it is just a matter of hyper parameter tuning. Maybe, the author could illuminate our thinking :p

szagoruyko commented 7 years ago

@Andreyisakov FC layers are initialized with Xavier, it doesn't affect the final accuracy https://github.com/torch/nn/blob/master/Linear.lua#L25 @Cadene thanks Remi

dlmacedo commented 7 years ago

Today I think we are using Xavier Initialization with Uniform Distribution (Default Torch) to Fully Connected Layers and Kaiming Initialization with Gaussian Distribution (MSRinit Function) to Convolutional Layers.

I don't see why not use the same Kaiming Initialization to both Convolutional and Fully Connected Layers, at least for the purpose of uniformity of treatment.

The following paper shows that Kaiming Initialization is supposed to be better than Xavier Initialization, at least to Convolutional Layers.

https://arxiv.org/pdf/1502.01852v1.pdf