can' resolve pdf encoded in ETenms-B5-H - Githubissues

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

5.99k stars 618 forks source link

can' resolve pdf encoded in ETenms-B5-H #1073

Closed JasonYZheng closed 6 months ago

JasonYZheng commented 6 months ago

Describe the bug

A clear and concise description of what the bug is.

Have you tried repairing the PDF?

Please try running your code with pdfplumber.open(..., repair=True) before submitting a bug report.

Code to reproduce the problem

Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been?

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

pdfplumber version: [e.g., 0.5.22]
Python version: [e.g., 3.8.1]
OS: [e.g., Mac, Linux, etc.]

Additional context

Add any other context/notes about the problem here.

jsvine commented 6 months ago

Closing this issue as it supplies no details. Deducing from the title, however, this seems to be an issue with support for the ETenms-B5-H character mapping. In that case, this would be an issue better resolved via pdfminer.six, the library that pdfplumber uses to parse PDFs: https://github.com/pdfminer/pdfminer.six