weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.68k forks source link

Handling bigger scales with (CONV 1x1 ---> CONV 3x3) #389

Open leonardoaraujosantos opened 7 years ago

leonardoaraujosantos commented 7 years ago

Introduction

Hi @weiliu89 As discussed on the issue #388 and the diagram bellow each group of "Extra Layers" is responsible to give feature maps that will allow the detection of bigger objects. diagram

Here I define what are the "Extra Layers" extraLayers

Question

  1. The idea behind cascading those layers is the same as the technique of using cascaded smaller convolutions to represent a bigger receptive field on the input? cascadingconvolutions
  2. The CONVs1x1 on the "Extra Layers" seems always adapt the depth to 256, this again it's the justification for my question (1), as to the technique to work all the cascaded layers must have the same depth.
  3. Once my understanding on SSD become better and the diagrams correct, can I push Pull Requests to help on the documentation?

References for the question

weiliu89 commented 7 years ago
  1. The main idea is to spread default boxes of different scale to different layer. The layers are not called cascade layers. Please read the paper for more details.

  2. All the layers that are used to predict bbox offset and confidence do not need to have same depth. For example, fc7 has 1024 channels.

  3. Yes. Thanks for contributing to it.