sparkfish / augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
https://github.com/sparkfish/augraphy
MIT License
345 stars 45 forks source link

Skewed bounding boxes #437

Open JKrivec opened 3 months ago

JKrivec commented 3 months ago

Hello,

When supplying the bounding boxes, I noticed that the degraded bounding boxes are not what I really imagined them to be.

image

The red text and the bounding boxes are what I pulled out of the original pdf, before degrading using Augraphy. Shouldn't the degraded boxes be the ones I outlined in blue, so the whole original object is outlined?

This is even more obvious when you look at it at the larger scale:

image

The larger bounding box around the table is supposed to encompass the table, but here we can see that some of the textual boxes are now outside of the actual area of interest.

Is this a feature or a bug?

kwcckw commented 3 months ago

Hi, so as in the documentation, only the start point and end point of the box are affected:

https://augraphy.readthedocs.io/en/latest/doc/source/augmentations/folding.html

So this should be consistent with your observation?

JKrivec commented 3 months ago

Yes, this is exactly what is stated in the docs, so I guess this is a feature, not a bug :).

Hovewer the second image I uploaded is a mix of Geometric and Folding, and with the larger bounding box, this is mostly an "issue" with the rotation. I would say that the correct way would be rotating the bounding box, then getting the bottom left and the top right coordinate and using that as the new bounding box.

If you were to label the table in the second image, where would you put the rectangle? I think you want to encompass the whole object. I am not a computer vision specialist, so I'm not sure what the correct way is, so this issue is maybe just opening a debate how the bounding box computation should be approached

kwcckw commented 3 months ago

Right, there should be a better solution to this problem. For example, for clockwise rotation, it should take top-left and bottom-right of the box, while for anticlockwise rotation (your example), it should take top-right and bottom-left of the box.

Thanks for pointing this out. So probably you can submit a pull request too if you are able to create a better alternative to address on this problem.

JKrivec commented 3 months ago

Yeah, the bounding box should just be the (min(all_x), min(all_y), max(all_x), max(all_y)) in my opinion. Im currently very low on time, but I might give it a crack in a few months! Feel free to close this, thanks :)