UnicodeDecodeError: 'gbk' codec can't decode byte when running parse_pdf_text.py

jackiekazil / data-wrangling

Code repository for Data Wrangling with Python (O'Reilly)

559 stars 564 forks source link

Hi, thank you for your wonderful book on data wrangling I encountered some issue when I was running the parse_pdf_text.py of chapter 5 in anaconda (python3.5) The IDE show me the followning error message

Traceback (most recent call last):

  File "<ipython-input-10-957ab6bc6f5e>", line 39, in <module>
    for line in openfile:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 46: illegal multibyte sequence

it looks like the code opened the file in text mode with a "gbk" encoding. It should probably be opened in binary mode? I'm not sure. How can I fix this problem? thank you.

jackiekazil / data-wrangling

UnicodeDecodeError: 'gbk' codec can't decode byte when running parse_pdf_text.py #9