kmaninis / OSVOS-caffe

One-Shot Video Object Segmentation
http://vision.ee.ethz.ch/~cvlsegmentation/osvos/
GNU General Public License v3.0
171 stars 67 forks source link

What does 'DSN' mean (in train_val_stepX.prototxt)? #1

Closed squidszyd closed 7 years ago

squidszyd commented 7 years ago

As title. For example here

kmaninis commented 7 years ago

Hi,

DSN stands for Deeply Supervised Net, and it has to do with supervision of intermediate featuremaps, as described in this paper: https://arxiv.org/abs/1504.06375

We use this technique to train the parent network; it is not needed for the online training. I hope this helps.

squidszyd commented 7 years ago

@kmaninis I've come up with a new question: There are four crop layers with the same name 'crop' in the prototxt at line 127/137/147/157. I'm quite confused about this because when I look into the implementation of LayerSetUp function of crop layer, the code checks the bottom size to be two, one for the data to be cropped and the other is the reference size. So, what does these crop layers actually do? Crops the four inputs as there are four crop layers with different names?

kmaninis commented 7 years ago

The crop layer crops the output features to the size of the input. This is needed to handle varying input sizes. For example, a 5x5 input when downscaled by a factor of 2 becomes 3x3, which in turn is upscaled to 6x6 features. The crop layer handles this inconsistency by using a central crop.

PS: There was a small bug remaining from a previous version of Caffe, where the crop layer by default performed a central crop, so one layer could handle all croppings, and so there was no problem with using the same name. In the "new" Caffe, the layer crops starting from the upper left corner, so additional offsets need to be specified. Fixed thanks to your comment. Thanks!

squidszyd commented 7 years ago

@kmaninis Thanks for explaining :) 👍