scottleibrand / gpt-summarizer

Extract text from PDF, summarize each section w/ GPT, and provide a summarized outline of the paper
MIT License
189 stars 27 forks source link

UnicodeEncodeError: 'charmap' codec can't encode character '\ufb01' in position 793 #7

Closed TtesseractT closed 1 year ago

TtesseractT commented 1 year ago

PC: Win 10

Traceback (most recent call last): File "C:\PythonScriptLocation\summarize.py", line 458, in f.write(text) File "C:\Python310\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\ufb01' in position 793: character maps to

I solved the issue:

Lines: 434, 457, 499, 513, 530, 532, 547, 564, 588, 603, 612, 630, 648

Add the attribute encoding='utf-8' to the code line like so:

with open(overall_summary_path, 'w', encoding='utf-8') as f:

scottleibrand commented 1 year ago

Thanks. ChatGPT says that On macOS and Linux, the default encoding is usually UTF-8., but On Windows, the default encoding is typically Windows-1252 or CP-1252.

I've only tested the code on macOS and Linux, so I think your solution is the correct one for Windows. Do you want to make a PR against my repo so you get contributor credit, or should I go ahead and do it and just reference this issue?

TtesseractT commented 1 year ago

Im unfamiliar with Github, would you be able to add me as a contributor?

Glad I could help. :D

scottleibrand commented 1 year ago

Committed in https://github.com/scottleibrand/gpt-summarizer/commit/ccbfb2fd13ea17042afa8dbeb859da7a8a9ed72d