jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
5.99k stars 618 forks source link

can' resolve pdf encoded in ETenms-B5-H #1073

Closed JasonYZheng closed 6 months ago

JasonYZheng commented 6 months ago

Describe the bug

A clear and concise description of what the bug is.

Have you tried repairing the PDF?

Please try running your code with pdfplumber.open(..., repair=True) before submitting a bug report.

Code to reproduce the problem

Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been?

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Additional context

Add any other context/notes about the problem here.

jsvine commented 6 months ago

Closing this issue as it supplies no details. Deducing from the title, however, this seems to be an issue with support for the ETenms-B5-H character mapping. In that case, this would be an issue better resolved via pdfminer.six, the library that pdfplumber uses to parse PDFs: https://github.com/pdfminer/pdfminer.six