melfm / avod-ssd

Code for 3D single stage object detection for autonomous driving
MIT License
99 stars 22 forks source link

understanding of avod-ssd framework #2

Open griffintin opened 6 years ago

griffintin commented 6 years ago

Thank you so much for making avod-ssd public, it helps a lot. I roughly read the code in avod_ssd_model.py, and please check whether my understanding is correct or not.

  1. AVOD-SSD is not a "REAL" SSD. Becuase (1) It still needs to generate 3D anchors in 3D space like what is done in AVOD-FPN before training. While, in SSD, anchors are generated from feature map like conv4_3, fc_7 and so on. (2) AVOD-SSD only use the last FPN feature maps for box generation. Whie in SSD, multiple scales of feature maps are utilized

  2. AVOD-SSD is called SSD, only because RPN of AVOD-FPN is gone. AVOD-SSD extracts FPN feature map from BEV and RGB, and then directly connect the maps to FC piplelines of (2048, 2048, 2048), which is 2nd fusion part in AVOD-FPN.

  3. Because of the above 2 reasons, inference time of AVOD-SSD is similar to AVOD. Since (1) FPN feature map generation and (2) fusion part of (2048, 2048, 2048) has no change, and fusion of (256, 256) is not taking too much time compared with the above 2 factors.

If any mis-understanding exists, it is appreciated for any one to point out.

melfm commented 6 years ago
  1. AVOD-SSD is not a "REAL" SSD. This depends on your definition of SSD. If you are comparing against the SSD: Single Shot MultiBox Detector work, then no the architecture is not based on that. SSD refers to single stage detector and its a generic term.

  2. Yes avod-ssd is RPN free, there is no proposal stage. Instead, all the anchors are classified and regressed directly.

  3. Yes the inference time of avod-ssd is similar to avod-fpn because they both use feature pyramids which increases the run-time. The feature extraction stage is the most time-consuming part, and in the case of avod-ssd, the FC layers also become more expensive due to large number of anchors, but I found that reducing the FC layers works better with the ssd version, so the overall speed remains the same.

Hope this helps.

griffintin commented 6 years ago

@melfm
Thank you so---so much for your comments.

As you mentioned that avod-ssd directly classfies and regresses a large number of anchors and FC layers becomes expensive. In the config file, number of anchors in a batch is 16384, which is huge. If we decrease the number, say, to half, how detection accuracy will change.

It is appreciated if you have any experimental data to share with us.

melfm commented 6 years ago

This work is based on Focal Loss for Dense Object Detection which focuses on improving the performance of single stage detectors by modifying the loss function.

To summarise why you might need a large number of anchors:

Now the focal loss is designed to address this imbalance issue. In the RetinaNet work they actually apply the loss to all ∼100K anchors. However in avod-ssd experiments, I found that increasing the number of anchors further did not affect the performance and slowed down the training and anywhere between ~8-16K anchors was sufficient to achieve a stable training.

Also note that this is the number of anchors you evaluate the loss on, and this does not affect the FC layers as all anchors are regressed during training.

Hope this helps.

griffintin commented 6 years ago

@melfm Thank you so much for explanation about class imbalance issue, it helps a lot to understand the whole achitecture of avod-ssd.

One question, in Focal Loss paper, pyramid of feature maps from ResNet-FPN are used to generate class and box offset data, which is similar to SSD(single-shot multibox detector).
While in avod-ssd, only the last FPN featuremap is used. Is this design because of processting time consideration, or other reasons. (If we indeed add more pyramids, the FC layers for cls, offests, angle regression will consume huge amount of model parameters, is this a reason?)

Currently, since avod-ssd's accuracy is about 8% drop compared with avod-fpn with 0.01s less time, how could we make avod-ssd more powerful? Like SSD, will strong data augmentaion help for avod-ssd?