Open daniel-j-h opened 6 years ago
https://github.com/mapbox/robosat/pull/46 switches our encoder to a pre-trained ResNet. We can now implement a feature pyramid network and put it on top of the resnet for two use-case: to improve segmentation and to move us towards the RetinaNet for object detection. These two use-cases can then be expressed as two separate heads on top of the FPN.
https://github.com/mapbox/robosat/pull/75 implements a Feature Pyramid Network (FPN) on top of the pre-trained Resnet. In addition it adds segmentation heads to the FPN. The RetinaNet can happen in parallel to that on top of the FPN now.
I see no reason why we can't implement object detection into robosat for specific use-cases.
The pre-processing and post-processing needs to be slightly adapted to work with bounding boxes but otherwise we can re-use probably 90% of what's already there.
This ticket tracks the task of implementing RetinaNet as an object detection architecture:
RetinaNet because it is an state of the art single-shot object detection architecture following our 80/20 philosophy where we favor simplicity and maintainability, and focus on the 20% of the causes responsible for 80% of the effects. It's simple, elegant and on par with the complex Faster-RCNN wrt. accuracy and runtime.
Here are the three basic ideas; please read the papers for in-depth details:
Focal Loss
Feature Pyramid Network (FPN)
RetinaNet
Tasks