Closed petermr closed 1 year ago
Update.
I have pip install
the latest pdfplumber
and get a similar error but no segfault:
(fails 62/118 tests)
(base) pm286macbook-2:pdfplumber pm286$ python -m unittest discover tests
E.EEE...EEEEE.EEEEEEEE.EEEEEEEEEEE.EE/opt/anaconda3/lib/python3.8/site-packages/pdfminer/psparser.py:592: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/pdfs/pdffill-demo.pdf'>
objs = [obj for (_, obj) in self.curstack]
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.EEE/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/test_display.py:62: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/pdfs/nics-background-checks-2015-11.pdf'>
page = pdfplumber.PDF(io.BytesIO(open(path, "rb").read())).pages[0]
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.EE/Users/pm286/workspace/pdfplumber1/pdfplumber/pdfplumber/utils/pdfinternals.py:74: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/pdfs/issue-71-duplicate-chars-2.pdf'>
return type(x)(resolve_all(v) for v in x)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.EEEEE..../opt/anaconda3/lib/python3.8/site-packages/pdfminer/converter.py:218: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/../examples/pdfs/ag-energy-round-up-2017-02-24.pdf'>
item = LTChar(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
................E.EE.EEEEE..E....E......E./Users/pm286/workspace/pdfplumber1/pdfplumber/tests/test_utils.py:203: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/comparisons/scotus-transcript-p1.txt' mode='r' encoding='UTF-8'>
open(os.path.join(HERE, "comparisons/scotus-transcript-p1.txt"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
E/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/test_utils.py:217: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/comparisons/scotus-transcript-p1-cropped.txt' mode='r' encoding='UTF-8'>
open(os.path.join(HERE, "comparisons/scotus-transcript-p1-cropped.txt"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
EEEE....E....EEEEE...
======================================================================
ERROR: test_annots (test_basics.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/test_basics.py", line 52, in test_annots
pdf = self.pdf_2
AttributeError: 'Test' object has no attribute 'pdf_2'
======================================================================
ERROR: test_colors (test_basics.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/test_basics.py", line 152, in test_colors
rect = self.pdf.pages[0].rects[0]
AttributeError: 'Test' object has no attribute 'pdf'
======================================================================
ERROR: test_crop_and_filter (test_basics.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/pm286/workspace/pdfplumber1/pdfplumber/tests/test_basics.py", line 67, in test_crop_and_filter
original = self.pdf.pages[0]
AttributeError: 'Test' object has no attribute 'pdf'
...
I run the tests in Pycharm and all but one pass, so I think it's a Python library problem and not worth spending time on.
Hi, and thanks for your interest in this library, especially to the point of running tests. pdfplumber
, however, uses pytest
rather than unittest
. You can run the tests via python -m pytest
or make tests
. Do you still get a segfault when you run that?
Thanks for the ultra-speedy response! I'll try pytest.
All except 1 pass (same in Pycharm) (the error looks like one of those fragile numbers that depend on "hidden variables" vary between runs)
(base) pm286macbook-2:pdfplumber pm286$ python -m pytest
================================================= test session starts ==================================================
platform darwin -- Python 3.8.3, pytest-7.1.2, pluggy-0.13.1
rootdir: /Users/pm286/workspace/pdfplumber1/pdfplumber, configfile: setup.cfg
plugins: cov-3.0.0
collected 118 items
tests/test_basics.py ................. [ 14%]
tests/test_ca_warn_report.py ..... [ 18%]
tests/test_convert.py ............ [ 28%]
tests/test_ctm.py . [ 29%]
tests/test_dedupe_chars.py .... [ 33%]
tests/test_display.py F.......... [ 42%]
tests/test_issues.py .................... [ 59%]
tests/test_laparams.py .... [ 62%]
tests/test_list_metadata.py . [ 63%]
tests/test_nics_report.py ..... [ 67%]
tests/test_table.py ........... [ 77%]
tests/test_utils.py ........................... [100%]
======================================================= FAILURES =======================================================
_________________________________________________ Test.test__repr_png_ _________________________________________________
self = <test_display.Test testMethod=test__repr_png_>
def test__repr_png_(self):
png = self.im._repr_png_()
assert isinstance(png, bytes)
> assert len(png) in (
71939,
61247,
) # PNG encoder seems to work differently on different setups
E AssertionError: assert 71983 in (71939, 61247)
E + where 71983 = len(b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03\xf0\x00\x00\x02d\x08\x02\x00\x00\x009\xbd]\xb8\x00\x01\x00\x00IDATx\...0\x00\x00\x00\x00@\x83AB\x0f\x00\x00\x00\x00\x00@\x83\xf9\xff\xc3\xf3\xbd\\\xff\x1e8\x11\x00\x00\x00\x00IEND\xaeB`\x82')
tests/test_display.py:93: AssertionError
=================================================== warnings summary ===================================================
../../../../../opt/anaconda3/lib/python3.8/site-packages/numexpr/expressions.py:21
../../../../../opt/anaconda3/lib/python3.8/site-packages/numexpr/expressions.py:21
/opt/anaconda3/lib/python3.8/site-packages/numexpr/expressions.py:21: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
_np_version_forbids_neg_powint = LooseVersion(numpy.__version__) >= LooseVersion('1.12.0b1')
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform darwin, python 3.8.3-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------
pdfplumber/__init__.py 7 0 100%
pdfplumber/_typing.py 8 0 100%
pdfplumber/_version.py 2 0 100%
pdfplumber/cli.py 34 0 100%
pdfplumber/container.py 112 0 100%
pdfplumber/convert.py 56 0 100%
pdfplumber/ctm.py 27 0 100%
pdfplumber/display.py 164 0 100%
pdfplumber/page.py 255 0 100%
pdfplumber/pdf.py 88 0 100%
pdfplumber/table.py 321 0 100%
pdfplumber/utils/__init__.py 5 0 100%
pdfplumber/utils/clustering.py 36 0 100%
pdfplumber/utils/generic.py 11 0 100%
pdfplumber/utils/geometry.py 128 0 100%
pdfplumber/utils/pdfinternals.py 48 0 100%
pdfplumber/utils/text.py 230 0 100%
------------------------------------------------------
TOTAL 1532 0 100%
Coverage XML written to file coverage.xml
=============================================== short test summary info ================================================
FAILED tests/test_display.py::Test::test__repr_png_ - AssertionError: assert 71983 in (71939, 61247)
====================================== 1 failed, 117 passed, 2 warnings in 22.68s ======================================
(base) pm286macbook-2:pdfplumber pm286$
Ah, very interesting, and thanks for sharing this. Seems related to how PNGs are encoded on different platforms. Just pushed a fix. Hopefully should pass if you repull and run again.
FWIW I do quite a lot with images and find that things like byte counts can vary between runs. I often write things like:
assert 71950 > len(png) > 71930
BTW I am excited at how much has been added since 0.7.4 and I'm about to get familiarised. We have a small team who are converting the UN IPCC reports (ca 10,000 pages of PDF) and the new features will be really useful. Impressed with the amount of table extraction. All Open Source, Volunteers very welcome!
Now works!
=================================================== warnings summary ===================================================
../../../../../opt/anaconda3/lib/python3.8/site-packages/numexpr/expressions.py:21
../../../../../opt/anaconda3/lib/python3.8/site-packages/numexpr/expressions.py:21
/opt/anaconda3/lib/python3.8/site-packages/numexpr/expressions.py:21: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
_np_version_forbids_neg_powint = LooseVersion(numpy.__version__) >= LooseVersion('1.12.0b1')
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform darwin, python 3.8.3-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------
pdfplumber/__init__.py 7 0 100%
pdfplumber/_typing.py 8 0 100%
pdfplumber/_version.py 2 0 100%
pdfplumber/cli.py 34 0 100%
pdfplumber/container.py 112 0 100%
pdfplumber/convert.py 56 0 100%
pdfplumber/ctm.py 27 0 100%
pdfplumber/display.py 164 0 100%
pdfplumber/page.py 255 0 100%
pdfplumber/pdf.py 88 0 100%
pdfplumber/table.py 321 0 100%
pdfplumber/utils/__init__.py 5 0 100%
pdfplumber/utils/clustering.py 36 0 100%
pdfplumber/utils/generic.py 11 0 100%
pdfplumber/utils/geometry.py 128 0 100%
pdfplumber/utils/pdfinternals.py 48 0 100%
pdfplumber/utils/text.py 230 0 100%
------------------------------------------------------
TOTAL 1532 0 100%
Coverage XML written to file coverage.xml
=========================================== 118 passed, 2 warnings in 22.03s ===========================================
(base) pm286macbook-2:pdfplumber pm286$
BTW congratulations on not only what you have done but also getting a responsible user community
I have cloned the latest PdfPlumber and ran tests (Macos). I get a segfault
(I am using previous versions of PDFPlumber so it's possible I may have libraries which are incompatible) Many thanks