microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.01k stars 231 forks source link

issue with package_area = package_rect.get_area() when adding --words_dir arg #135

Open AymenDoc opened 10 months ago

AymenDoc commented 10 months ago

I attempted to incorporate the --words_dir argument for cell recognition. However, I encountered an issue within the "slot_into_containers" function while calculating the area of a rectangle.

My package_rect is defined as follows, but I'm getting a package_area value of 0 at this line: package_area = package_rect.get_area() The Rect(95.99999451586913, 383.89018184350584, 98.0000028239502, 374.8903775869141)

bsmock commented 10 months ago

Hi,

Since Rect is in the form (x0, y0, x1, y1), in your case we would have y0 > y1, which means technically the box has negative area. Is this bad Rect coming from one of the words in your words data?

Best, Brandon