Closed samkit-jain closed 4 years ago
Merging #253 into develop will increase coverage by
0.04%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## develop #253 +/- ##
===========================================
+ Coverage 97.36% 97.41% +0.04%
===========================================
Files 10 10
Lines 1139 1160 +21
===========================================
+ Hits 1109 1130 +21
Misses 30 30
Impacted Files | Coverage Δ | |
---|---|---|
pdfplumber/display.py | 81.36% <100.00%> (ø) |
|
pdfplumber/table.py | 100.00% <100.00%> (ø) |
|
pdfplumber/utils.py | 100.00% <100.00%> (ø) |
|
pdfplumber/pdf.py | 100.00% <0.00%> (ø) |
|
pdfplumber/page.py | 100.00% <0.00%> (ø) |
|
pdfplumber/container.py | 100.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 798a152...8d00356. Read the comment docs.
Many thanks, @samkit-jain! This looks like a solid improvement. Re. the still-present issues:
pandas >= 1.0 not available on Python 3.5. Is this to be considered a blocker and support for Python 3.5 be dropped or downgrade pandas version for Python 3.5?
Thanks for raising this question. pandas
is only a dev requirement, because it is used in a few tests. But, examining where it is used, pandas
is not really essential to those tests. I think I can remove pandas
entirely from the requirements and still maintain the current level of code coverage by adding just a few more small tests. I'll start a branch with those changes.
Since a matrix run, the code coverage report is published 4 times for the 4 tests that are run with the same codecov message being updated again and again. The code coverage would still remain the same for all 4 versions but still has a drawback that it becomes difficult for the reader to know that the codecov message they are seeing is from which run.
Ah, good catch. I am not very accustomed to using GitHub Actions, but it looks like could solve the problem: https://github.com/codecov/codecov-action/issues/40
Instead of using Makefile, specifying the command to run directly in the workflow step. @jsvine Why was Makefile used and what extra benefit is it providing?
I am using the Makefile to lint, format, and run tests locally. I figured it'd keep things simple to reflect that in the GitHub Action, but I am fine with the changes you've made to specify the commands more directly. Thanks for checking!
Following up on the pandas
question, here is a commit, on the new tests/remove-pandas
branch, that makes these changes: https://github.com/jsvine/pdfplumber/commit/5abac62324fa1dacfcf893580b5d7f813a905c6f
Feel free to merge tests/remove-pandas
into your PR branch if it suits you.
@jsvine I have merged the pandas branch in this PR. The PR is now ready for review and the pandas branch can be deleted if this PR is merged in.
Additional changes made:
test
job into two separate lint
and test
jobs. The lint
job is mainly for linting purposes and has black
and flake8
checks. The test
job has the tests. The lint
job would be run only once on Python 3.8 while the test
job would be run 4 times on Python 3.5|6|7|8. This would result in a shorter overall execution time. Reasoning: I don't think we need the linter running on every job in the build matrix as the results would most likely be the same each time and it might be better to add a separate "lint" job.black
as it is Python 3.6+ only and to fix that I pushed another commit https://github.com/jsvine/pdfplumber/pull/253/commits/a8f15c1ca2dabba38b1a82db55313a2599e01d93 (skips black install in the test job as it is not needed) which again failed because of the use of f-strings at certain places in the code and f-strings is not supported in Python 3.5. There are 2 possible solutions to keep Python 3.5 support and pass the test as well. a) Replace f-strings with .format()
, b) Install future-fstrings
and allow f-strings usage in Python 3.5. I have decided to go with option a because in the code, at some places .format()
is used while at some f-strings and it is better to have the same formatting applied across. Plus, no need for an additional dependency. In future, when Python 3.5 support would be dropped (as of https://devguide.python.org/#status-of-python-branches it is reaching end of life very soon), all the string formatters can be updated to use f-strings.
Thoughts @jsvine ?
Also, in https://github.com/jsvine/pdfplumber/blob/develop/Makefile#L16-L25 pdfpumber
is referring to the non-root folder pdfplumber or the root folder pdfplumber?
@jsvine Python 3.5 tests still failing :(
test_loading_pathobj
is failing because pathlib
works correctly with open
only on Python 3.6+. See "Changed in version 3.6:" section in https://docs.python.org/3/library/functions.html#open test_cli
is failing because as per https://docs.python.org/3/library/json.html#json.loads, json.loads()
started accepting bytes
only in Python 3.6.test_extract_words
is failing.Ahh, thank you for catching this, Samkit, and my apologies for not examining 3.5 support earlier. I've now looked into the extract_words
failure, and found that it relates to this line: https://github.com/jsvine/pdfplumber/blob/798a152156d400f39efbfec8b4a27603c9309f94/pdfplumber/utils.py#L274
Specifically, it's due to the new dict
implementation introduced in Python 3.6, which resulted in order-preserving dictionaries: https://docs.python.org/3/whatsnew/3.6.html#new-dict-implementation
Changing chars_by_upright = {1: [], 0: []}
to chars_by_upright = OrderedDict([ (1, []), (0, []) ])
makes the test pass in 3.5. (Without that change, 3.5 returns vertically-oriented words first, instead of horizontally oriented ones.)
Given the following:
pdfplumber
pathlib
fully... I think pdfplumber
ought to drop support for 3.5. What do you think?
If you agree, can you roll back 89678ed and remove 3.5 from the test matrix, README, and setup.py? (Again, apologies for not catching this earlier 😬 .) I think we can, however, keep the change that remove pandas
from the tests and dev requirements.
No problem @jsvine I am glad that the PR could help in identifying the issues with Python 3.5. I am 👍🏻 with dropping support for Python 3.5 and will make the necessary changes. Just to confirm, when you say "roll back https://github.com/jsvine/pdfplumber/commit/89678ed3fe8d9271741b6bba0a397cd259f58024" are you referring to adding another commit reverting the change or dropping the commit altogether from the commit history?
Great, and thanks! Reviewing the commit history again, it seems a8f15c1 is no longer necessary, either. So I'd suggest running git reset --hard c4b2417
and then force-pushing to this branch. Open to other proposals, however, if you'd rather not force-push.
@jsvine Done with the changes. Request your review again.
pytest-parallel
to the list of extra requirements and using it for a faster overall execution runtime. Gave an improvement of 32%.Issues still present:
pandas >= 1.0
not available on Python 3.5. Is this to be considered a blocker and support for Python 3.5 be dropped or downgrade pandas version for Python 3.5?