To identify the tables in each page properly. Here there are two tables
Actual behavior
playing with intersection_tolerance variable to handle more lines in a row, it detect one table, Space between tables also consider as row. Not able to detect two tables properly
Describe the bug
PDF has multiple tables across the documents. Tables are shaded/banded rows with varying lines in row
Code to reproduce the problem
Load the PDF file with pdfplumber
plumber_file = pdfplumber.open(pdf_file) pdf_page = plumber_file.pages[29-1] #127 #67 im = pdf_page.to_image()
Table settings.
ts = { "vertical_strategy": "lines", "horizontal_strategy": "lines", 'intersection_tolerance': 32 } im.debug_tablefinder(ts)
PDF file
Using the Public available pdf https://www.mtu-solutions.com/content/dam/mtu/technical-information/operating-instructions/diesel/mtu-series-1600/marine/MS15029_01E.pdf/_jcr_content/renditions/original./MS15029_01E.pdf
Expected behavior
To identify the tables in each page properly. Here there are two tables
Actual behavior
playing with intersection_tolerance variable to handle more lines in a row, it detect one table, Space between tables also consider as row. Not able to detect two tables properly
Screenshots
Environment
Additional context
Add any other context/notes about the problem here.