Closed samkit-jain closed 3 years ago
Merging #320 (b5239ad) into develop (d3b84da) will increase coverage by
0.68%
. The diff coverage is83.33%
.
@@ Coverage Diff @@
## develop #320 +/- ##
===========================================
+ Coverage 97.48% 98.17% +0.68%
===========================================
Files 10 10
Lines 1193 1205 +12
===========================================
+ Hits 1163 1183 +20
+ Misses 30 22 -8
Impacted Files | Coverage Δ | |
---|---|---|
pdfplumber/pdf.py | 95.29% <60.00%> (-4.71%) |
:arrow_down: |
pdfplumber/utils.py | 100.00% <100.00%> (ø) |
|
pdfplumber/display.py | 88.81% <0.00%> (+7.45%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d3b84da...e849cbb. Read the comment docs.
Thanks, @samkit-jain! I think this is a smart approach. Just a few notes / requests:
I think we can also make this behaviour configurable by adding a new parameter, say,
fail_on_metadata
in thePDF.open()
method that when set toTrue
, will raise an exception and whenFalse
, just log to the console.
- I like that. Want to try implementing on this PR? Perhaps the parameter, however, should be
strict_metadata
? Open to other ideas.
Would you be able to add a test that brings the coverage for pdf.py
back up to 100%?
Can you add the related entries to the CHANGELOG.md
?
Thanks again!
@jsvine I have made the metadata failure condition configurable. Since I am unable to create a PDF that would cause an exception in the new recursive resolver method, I was unable to increase the overall test coverage. Added tests for the draw methods to increase the coverage instead, but, this meant that the test coverage for the files changed in this PR is still below the previous one. That's why the codecov/patch is failing but codecov/project is passing. The changelog has been updated as well.
Thanks, @samkit-jain! I'm slightly wary of incorporating new code that doesn't have a PDF to test it with, but have merged this since (a) the logic of the code makes sense, (b) the untested part of the code handles what would already be edge-cases, and (c) the tests pass for all the PDFs we are using.
This PR fixes issue #316
Background: The PDF exhibiting the exception contained a metadata key
Changes
which was a list ofPDFObjRef
objects. The code responsible for resolving metadata expected a list to be of string-like objects and since in this case, it wasn't, thedecode_text()
function raised theTypeError
exception as mentioned in the issue.This metadata value type is not as per the PDF specification (see p556 sec14.3). PDF specifications are well, just specifications. Not conforming to it does not necessarily render a PDF unreadable and a lot of PDFs don't abide by every single rule mentioned in it. Hence, in the first commit, I added a
try-except
block that would log a warning to the console when a metadata value is unparseable and continues to the next one. This would allow the user to perform non-metadata related operations without any issue. I think we can also make this behaviour configurable by adding a new parameter, say,fail_on_metadata
in thePDF.open()
method that when set toTrue
, will raise an exception and whenFalse
, just log to the console.Fix: To resolve the issue, I have added a new method
resolve_and_decode()
which would recursively go through the metadata values and decode them.Additional: Pinned versions of
pytest
and removed unused imports intests/test_issues.py
.