wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
553 stars 191 forks source link

How can I get the images with the bboxes in the inference? #65

Open jorgerodriguezsj opened 3 years ago

jorgerodriguezsj commented 3 years ago

Reading the code, I saw that there was a function that drew the bounding boxes for me. How can I make it return in inference the bboxes in the image?

thanks

@wenwenyu

knitemblazor commented 3 years ago

i guess you are talking about inference . you need to use an ocr engine like tesseract

sabyasachi-basu-i commented 2 years ago

Question: tesseract hocr gives us 4 co-ordinates for bbox. may you have an idea how it can be translated to the 8 points in PICK-pytorch?

knitemblazor commented 2 years ago

hi,

the four of the 8 coordinates in a bbox are xmin,ymin,xmax,ymax these are bottom left, top right coordinates of a rectangle in a 2d space so you just need to calculate the hight and width to generate rest of the coordinates. it’s fairly simple do let me know if you have a doubt.

sabyasachi-basu-i commented 2 years ago

I found them to be this:

o/p of tesseract is (x1,y1 -> min & x3,y3 - max)

image

image

x2 = x3
y2 = y1

x4 = x1
y4 = y3

would that be correct?

knitemblazor commented 2 years ago

yes