Open leonardoaraujosantos opened 7 years ago
The AnnotatedData layer is only responsible to generate random patches according to the configuration. If batch_sampler allows, it is possible to generate an image without any ground truth objects. The negative mining is done in the MultiBoxLoss layer after matching ground truth boxes and default boxes.
Since VGG is not trained with batch norm, we found using L2 normalization (from ParseNet) is a nice and easy work around to stabilize the training with VGG. An advantage over batch normalization is that L2 normalization can be done separately per feature map location and does not depends on the number of image in a batch.
The size of default boxes at each feature map layer is configurable. Notice, the default boxes are in the normalized scale [0, 1].
To better combine predictions from different layer
New diagrams and layers explanation
Idea: Explain SSD flow using diagrams and describing the it's new layers (Questions on what I did not understand yet bellow)
Training Diagram
Prediction Diagram
Extra Convs
Group of 1x1 and 3x3 convs used to sample activations from CONV_4_3 and last layers (FC), it's used to improve the detection of bigger objects.
Detection layer
New layers added to support SSD
Priorboxes
Generate default boxes using the Image and feature map dimensions
Annotated Data
DetectionOutput
Do the Non-maxima supression during prediction to filter the best region per object.
SmoothL1
Distance metric used on MultiboxLoss layer, more specifically on the Location part.
Permute
Used to change the dimension position on the tensors. (Don't know why??)
Normalize
Used to make activations of conv4_3 smaller, (There are no tests to check if this is needed on other layers, or if they can be substituted by batchNorm or other variantes of batchNorm(NIPS2016)
Deconvolution/AtrousConvolution/Dilated Conv
Acutally it was alredy on caffe, during the experiemnts was found better performance (frame rate)
Open Questions