longcw / faster_rcnn_pytorch

Faster RCNN with PyTorch
MIT License
1.7k stars 466 forks source link

Wrong format for bounding boxes #36

Open Rizhiy opened 6 years ago

Rizhiy commented 6 years ago

It seems that the network uses x1,y1,x2,y2 format for bounding boxes instead of x,y,w,h used in the paper. I think this is a pretty major difference that can affect training accuracy.

In x,y,w,h format two coordinates are used for centering and two for size, which presents clear separation and can be debugged easily. In the current format, all four coordinates are used for both centering and size, which makes it more difficult to debug.

Cadene commented 6 years ago

Did you try it?

Rizhiy commented 6 years ago

I haven't since I don't quite understand the whole codebase and it appears that quite a bit would have to be changed. In particular, it appears that cython code expects it in the current format and I don't have to access to cython source to change it.

It appears that this format was chosen in the fast-rcnn pytorch implementation or maybe even before, so probably would be difficult to change now. I don't think that training accuracy will be affected that much, but may matter if you are trying to win a competition.

Cadene commented 6 years ago

Yeap, unfortunately this code is difficult to understand and modify. As I was looking for some localization models in pytorch, I found this repo https://github.com/amdegroot/ssd.pytorch. The model works nicely and the codebase is way easier to understand.

It seems that the ssd.pytorch models use x1,y1,x2,y2 format as well https://github.com/amdegroot/ssd.pytorch/blob/master/data/voc0712.py#L81

Rizhiy commented 6 years ago

I found cython source, so might try to change it later.