jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Specifying setup.py encoding formats #319

Closed akaiuun12 closed 3 years ago

akaiuun12 commented 3 years ago

Describe the bug

Installation using pip was failed with 'cp949 error'. The error occurs only to the Windows with specific language using cp949 format, Korean in my case. No problem on Mac, not tested on Linux.

This error is not specific to pdfplumber and can be found quite frequently. While Korean users can solve the problem individually, specifying the encoding formats of the setup file can simply fix the issue.

Expected behavior

image

Actual behavior

image

UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 4981: illegal multibyte sequence

Environment

Additional context

As noted above, the error is because of the unspecified encoding formats of the setup file in a certain Windows environment. The problem can be easily fixed by adding the encoding parameter to the setup file as below. I'd willing to send a merge request if the project manager is willing to fix this bug. Thanks.

with open(os.path.join(HERE, "README.md"), encoding='UTF-8') as f:
    long_description = f.read()
jsvine commented 3 years ago

Thank you for the detailed and well-described report, @akaiuun12! Thankfully, this has already been fixed in the develop branch via https://github.com/jsvine/pdfplumber/commit/78543284187586f8ca8428a73f93efb714152693 and will be available via PyPI in the next release.

jsvine commented 3 years ago

Now available in v0.5.25.