About msra resnet and its weight initialization method for transposed conv layer

xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:

MIT License

7.27k stars 1.92k forks source link

About msra resnet and its weight initialization method for transposed conv layer #892

Open developer0hye opened 3 years ago

developer0hye commented 3 years ago

https://github.com/xingyizhou/CenterNet/blob/2b7692c377c6686fb35e473dac2de6105eed62c6/src/lib/models/networks/msra_resnet.py#L184-L209

What does msra mean?

Why don't you call fill_up_weights function for this layer in msra resnetwork?

fill_up_weights is important to reproduce the results in the paper?

https://github.com/xingyizhou/CenterNet/blob/2b7692c377c6686fb35e473dac2de6105eed62c6/src/lib/models/networks/resnet_dcn.py#L110-L119

developer0hye commented 3 years ago

Does msra mean Microsoft Research Asia Lab?

gau-nernst commented 3 years ago

Does msra mean Microsoft Research Asia Lab?

Yes I think so. The original code is here https://github.com/microsoft/human-pose-estimation.pytorch/blob/master/lib/models/pose_resnet.py

It's from this paper: https://arxiv.org/abs/1804.06208

As I'm also digging into the code, I'm also stuck at the fill_up_weights(). There seems like no apparent reason to initialize the weights like that? Let me know if you have found the answer.

Cheers!

gau-nernst commented 3 years ago

I just found an extract from the CenterNet paper

ResNet Xiao et al. [55] augment a standard residual network [22] with three up-convolutional networks to allow for a higher-resolution output (output stride 4). We first change the channels of the three upsampling layers to 256, 128, 64, respectively, to save computation. We then add one 3 × 3 deformable convolutional layer before each up-convolution with channel 256, 128, 64, respectively. The up-convolutional kernels are initialized as bilinear interpolation. See supplement for a detailed architecture diagram.

Seems like the fill_up_weights() initializes it to be a bilinear interpolation layer.

developer0hye commented 3 years ago

@gau-nernst

Thanks for your reply!

It seems that the developers of this model may follow dla's weight initialization method.

I guess that the up-convolutional layer can be replaced by bilinear upsampling layer and it will harm the performance little bit.

I also implemented centernet using pytorch.

https://github.com/developer0hye/Simple-CenterNet

gau-nernst commented 3 years ago

Thanks for the reference.

Your implementation of CenterNet looks great! Much easier to read.