table extraction not working properly - when there is a change in contrast between Title and rows

sreeram1658 commented 2 weeks ago

Description of the bug

I am trying to extract a table inside my pdf document using fitz -

doc = fitz.open("sample_table.pdf") page = doc[4] tabs = page.find_tables(horizontal_strategy="lines", vertical_strategy="lines",) tab = tabs[0] df = tab.to_pandas() df

My document -

Output comes something like this -

Clearly the cells in Rows which are not highlighted are not captured in here - how can I rectify this

How to reproduce the bug

Already explained above

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.9

JorjMcKie commented 2 weeks ago

This post cannot be accepted as a an issue yet because a reproducing file has not been supplied.

JorjMcKie commented 1 week ago

Closed b/o extended period of time without user's reaction.

pymupdf / PyMuPDF