xavctn / img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
MIT License
571 stars 76 forks source link

bordered tables are not detected properly #223

Open harundiri opened 1 month ago

harundiri commented 1 month ago

I use this to extract tables. I also extract text from non-table areas by creating a mask of the non-table areas using the bounding boxes of the tables. for some images, Image.extract_tables does not seem to detect tables properly. Here is an example image: page_2

and this is the image after creating a mask of the non-table areas for regular text extraction non_table_regions

it is not detecting the first rows of both tables and the last row of the first table in the image.