xavctn / img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
MIT License
581 stars 76 forks source link

Is there a way to change the threshold for row identification? #155

Closed nickcoast closed 11 months ago

nickcoast commented 11 months ago

EDIT:

I just suddenly made progress. Might not have needed to open this issue. Will close for now.

Original post:

I have some borderless tables where the rows have a very small separation, and the output sometimes has several rows in one. But, those bunched cells there are "\n", so it seems like separating them would have been possible.

I'm not seeing any options in the API to tweak this. If there are any, could you let me know?

Or something I can change in extract_tables or identify_borderless_tables functions to test with tables like this one?

Here's the original image and a screenshot of the HTML output:

Somali_trees_table_1 image

nickcoast commented 11 months ago

I may have solved this already. Closing for now.