sgrvinod / a-PyTorch-Tutorial-to-Object-Detection

SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection
MIT License
3.02k stars 713 forks source link

Predict box shape directly instead of offsets? #79

Open stevebottos opened 3 years ago

stevebottos commented 3 years ago

More of a question than an issue really. I was curious - if I'm understanding correctly the network will predict offsets for each anchor box, which in turn will describe a bounding box. This requires lots of conversions (cxcy to xy, encoding, decoding), so would it not be possible to simply train the network to output as [xmin, ymin, xmax, ymax] instead of [offset-x, offset-y, width, height]? If not, what are the issues with this?

In the same vein, is the encoding and decoding of the bounding box only necessary because we need to go from offsets -> bounding box described by offsets?