Closed samkit-jain closed 3 years ago
Merging #338 (70829c0) into develop (de767c3) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## develop #338 +/- ##
========================================
Coverage 98.17% 98.17%
========================================
Files 10 10
Lines 1205 1205
========================================
Hits 1183 1183
Misses 22 22
Impacted Files | Coverage Δ | |
---|---|---|
pdfplumber/table.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update de767c3...a17c5a7. Read the comment docs.
Thanks for the review @jsvine How about we add a new argument table_direction
that controls whether the tables are read leftmost (default) or topmost? This would ensure that the change does not break any integrations.
My initial instinct is not to add this argument. My hunch is that this change will affect very few integrations, and that there is (and will remain) a fairly easy way to customize the table order, by using page.find_tables(...)
instead of page.extract_tables(...)
. E.g.,:
tables = sorted(my_page.find_tables(), key=lambda x: x.bbox)
extracted = [ t.extract() for t in tables ]
That said, I'm open to being persuaded otherwise. Thoughts?
@jsvine I agree. The user has control over the sorting using the .find_tables(...)
method which makes adding a new argument redundant.
Fixes #336 h/t @gqh1995 for reporting
When multiple tables are found on a page, sorts them top to bottom. If there is a collision, sorts left to right. Not sure, but I think instead of sorting as I have done here, but perhaps, can update the sorting at https://github.com/jsvine/pdfplumber/blob/de767c3a52bbcacd237a17a051a3975d592023d9/pdfplumber/table.py#L315 instead.
PS: The changelog would need to be updated with the commit hash after this PR gets merged in.