tusen-ai / simpledet

A Simple and Versatile Framework for Object Detection and Instance Recognition
Apache License 2.0
3.08k stars 486 forks source link

Trident 3 branch fused? #335

Open hezhu1996 opened 4 years ago

hezhu1996 commented 4 years ago

Hi, thank you for your great work. I was wondering after using Trident in conv4 layer(which is best is your paper), How do you fuse them into one branch to feed into RPN if not apply scale specific setting, concat or element-wise product? cause I want to try it in single-shot detector which don't have RPN or something like that. Thanks for your reply :)

xmyqsh commented 4 years ago

Try to analogy to FPN, FPN can be used in both two-stage and one-stage detector.

hezhu1996 commented 4 years ago

@xmyqsh So do you mean you actually didn't fuse them together in one branch but with each branch goes to a individual RPN and RCNN header even without scale-aware training scheme? Thanks

xmyqsh commented 4 years ago

@TWDH Aha, I got you. TridentNet is developed on the two stage-detection, inherited from faster-rcnn, not FPN, but could be viewed as another version of FPN. It adopts a similar training scheme that SNIP introduced, but SNIP uses faster-rcnn or R-FCN, not FPN. What the innovate of TridentNet is that it uses dilation to get feature pyramid instead of image pyramid in SNIP or SNIPER and is pretrained on the ImageNet. I'd like to see someone pretrains FPN on the imageNet to see how much gain could be got.

I cannot say if TridentDilation better than FPN, or vice versa, both of them use the feature pyramid. TridentDilation could detect small scale objects with fewer resolution than FPN, but for extreme small object, it will turn to image pyramid. FPN has similar problem and higher resolution for small object. For large object, TridentDilation use the same resolution which is not flexible and efficient. For extreme larger object, TridentNet have to turn to image pyramid again. But for a specific object scale, TridentNet is definitely better than FPN. For a diverse scale, image pyramid is more suitable for TridentNet because of its scale-aware training scheme.

What is scale-aware training scheme? Scale-aware training scheme shout out at the detector: Be stupid! Do what you should do! Do what you good at! Be a scale specific detector! :)

If my remember is correct, the scale-aware training scheme is mainly on rpn phase, removing the extreme-scale harder example for a specific feature map to ease the modeling learning. And the dropped extreme-scale objects could be handle by other suitable feature maps or image pyramid.

For RCNN, all of the two-stage detectors are the same. RPN is on several branch/feature map, and roi-pooling to the same 7x7 size which should be the fuse you wanted.

Now, let's have a conclusion, TridentNet and its scale-aware training scheme could be used in one-stage detector. You could find some clues in the FCOS anchor selection scheme, it have adopted the scale-aware training scheme more or less.

At last, I have developed a detector called CropNet, which can double boost APs without extra order of computation, targeting autonomous driving scenario. Instead of pretrained it on imageNet, we could train it on larger autonomous driving dataset.

I'm not the author of TridentNet, there maybe some misinterpreted of it. I'd love to see the author correct me :)

Ops... I have missed an important feature of TridentNet, the weight-sharing in the TridentDilation. I have to say, this is the most innovative design that I liked. It allows to use different scales of objects to train the same weight. As a result, only using one branch which is trained by three branch objects could get very promising performance and fast speed.

hezhu1996 commented 4 years ago

Thanks for the comments. It seems TridentNet split the original resnet into 3 branches and each branch connects to a RPN and RCNN header respectively which means there are 3 RPN,RCNN altoghter without interference each other. I notice that scale-aware acturally just improve about 0.3% which is not that important:) Not sure if im right