karolzak / boxdetect

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.
MIT License
105 stars 20 forks source link

Default config for vertical grouping has bad results for vertically aligned checkboxes #24

Open martinkozle opened 1 year ago

martinkozle commented 1 year ago

I am creating this issue to help anybody having the same issue with vertically aligned checkboxes not being detected well.

The group_size_range config option gets overwritten to a hardcoded value of (1, 1) at the start of the get_checkboxes pipeline. So setting that config option does nothing when using this function.

By default in the config the vertical_max_distance option is set to 10, meaning if you are trying to detect vertically aligned checkboxes (like in a form) it will give really bad results as it will see the whole column as a single group. I don't know if this is intended and what the use case is. I don't quite understand the grouping logic in the library.

Ways to fix it would be to either set this option to 0, and then find and filter out unwanted close detections with your own needed logic. Or copy over the get_checkboxes function without that first hardcoding line (but this might group horizontal checkboxes). I don't understand the difference between the vertical and the horizontal grouping but vertical grouping for checkboxes seems to be a bit faulty.

karolzak commented 1 year ago

Hi @martinkozle ! Thanks for sharing your issue. Looking back I agree that putting that hardcoded (1,1) value for group_size_range wasn't the best practice but I can definitely share some of the logic behind it:

Let me know if that helps in anyway!

martinkozle commented 1 year ago

I can't really send the real images that I was working with. So I tried recreating the issue with an image I found on the internet: renditionDownload

With these options:

cfg = config.PipelinesConfig()

# important to adjust these values to match the size of boxes on your image
cfg.width_range = [(10, 50)]
cfg.height_range = [(10, 50)]

# the more scaling factors the more accurate the results but also it takes more time to processing
# too small scaling factor may cause false positives
# too big scaling factor will take a lot of processing time
cfg.scaling_factors = [0.7, 0.9, 1.0, 1.2, 1.5, 2.0]

# w/h ratio range for boxes/rectangles filtering
cfg.wh_ratio_range = [(0.9, 1.1)]

# num of iterations when running dilation tranformation (to engance the image)
cfg.dilation_iterations = [0]

checkboxes = get_checkboxes(img, cfg=cfg, px_threshold=0.1, verbose=True)

img_vis = img.copy()
for (x, y, w, h), _, _ in checkboxes:
    img_vis = cv2.rectangle(img_vis, (x, y), (x+w, y+h), (0, 255, 0), thickness=3)

It found 0 checkboxes.

By setting only vertical max distance:

cfg.vertical_max_distance = [0]

I get a couple of checkboxes detected: tmp6zetij1h

By setting only horizontal max distance:

cfg.horizontal_max_distance = [0]

I get all checkboxes detected: tmp8lumh8rn

And by setting both options I also get all checkboxes detected

This is why I was confused with the grouping regarding checkboxes.

Unrelated to this, on my data the checked checkboxes had big checks and X marks that extended far outside the checkboxes that made the square contours approach not really work. So in the end I made my own solution from scratch that only uses kernels and filter2D to find checkboxes that works well in my case (not a general solution).

Thank you for your help. I hope that you can also reproduce the same issue and that it helps in improving the library.