Is there way to extract coordinates

SuryaViswanath11 commented 3 years ago

Hi,

I am working with pdf files and I came across the box-detect library. Thanks for creating this amazing library. I am using the following PDF file Form_49A.PDF I am trying to annotate over the pdf file, however, for achieving this I was looking at ways to extract these boxes and then annotate. Is there a way to extract the coordinates for the boxes present in the pdf-file?

karolzak commented 3 years ago

Hi @SuryaViswanath11 , Glad to hear you found BoxDetect useful. To use BoxDetect functions you need to first convert your PDF to images which is a fairly simple task. You can use one of few available packages to do it, like pdf2image

teohsinyee commented 2 years ago

Hi @SuryaViswanath11 , Glad to hear you found BoxDetect useful. To use BoxDetect functions you need to first convert your PDF to images which is a fairly simple task. You can use one of few available packages to do it, like pdf2image

Seems there is no direct answer to the question, Is there a way to extract the coordinates for the boxes present in the image file?

karolzak commented 2 years ago

Hi @teohsinyee Each function from BoxDetect takes an image as input and returns a collection of coordinates for detected boxes (based on config params). Example:

from boxdetect.pipelines import get_boxes

rects, grouping_rects, image, output_image = get_boxes(
    file_name, cfg=cfg, plot=False)

print(grouping_rects)

OUT:
# (x, y, w, h)
[(276, 276, 1221, 33),
 (324, 466, 430, 33),
 (384, 884, 442, 33),
 (985, 952, 410, 32),
 (779, 1052, 156, 33),
 (253, 1256, 445, 33)]

import matplotlib.pyplot as plt

plt.figure(figsize=(20,20))
plt.imshow(output_image)
plt.show()

Another:

from boxdetect.pipelines import get_checkboxes

checkboxes = get_checkboxes(
    file_path, cfg=cfg, px_threshold=0.1, plot=False, verbose=True)

print("Output object type: ", type(checkboxes))
for checkbox in checkboxes:
    print("Checkbox bounding rectangle (x,y,width,height): ", checkbox[0])
    print("Result of `contains_pixels` for the checkbox: ", checkbox[1])
    print("Display the cropout of checkbox:")
    plt.figure(figsize=(1,1))
    plt.imshow(checkbox[2])
    plt.show()

karolzak / boxdetect

Is there way to extract coordinates #15