mapbox / robosat

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
MIT License
2.02k stars 383 forks source link

Implement RetinaNet for Object Detection #12

Open daniel-j-h opened 6 years ago

daniel-j-h commented 6 years ago

I see no reason why we can't implement object detection into robosat for specific use-cases.

The pre-processing and post-processing needs to be slightly adapted to work with bounding boxes but otherwise we can re-use probably 90% of what's already there.

This ticket tracks the task of implementing RetinaNet as an object detection architecture:

RetinaNet because it is an state of the art single-shot object detection architecture following our 80/20 philosophy where we favor simplicity and maintainability, and focus on the 20% of the causes responsible for 80% of the effects. It's simple, elegant and on par with the complex Faster-RCNN wrt. accuracy and runtime.

Here are the three basic ideas; please read the papers for in-depth details:

Focal Loss

focal-loss

Feature Pyramid Network (FPN)

fpn

RetinaNet

retina-net

Tasks

daniel-j-h commented 6 years ago

https://github.com/mapbox/robosat/pull/46 switches our encoder to a pre-trained ResNet. We can now implement a feature pyramid network and put it on top of the resnet for two use-case: to improve segmentation and to move us towards the RetinaNet for object detection. These two use-cases can then be expressed as two separate heads on top of the FPN.

daniel-j-h commented 6 years ago

https://github.com/mapbox/robosat/pull/75 implements a Feature Pyramid Network (FPN) on top of the pre-trained Resnet. In addition it adds segmentation heads to the FPN. The RetinaNet can happen in parallel to that on top of the FPN now.