joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
94 stars 27 forks source link

UnicodeEncodeError: 'charmap' codec can't encode character '\u2794' (➔) in position 0: character maps to <undefined> #48

Closed luckydonald closed 1 year ago

luckydonald commented 1 year ago

Having an RTF file containing A ➔ B fails.

Example file

Test.rtf.txt (remove .txt from it)

from striprtf.striprtf import rtf_to_text  # pip3 install striprtf

FILE = 'example_rtf_from_above.rtf'
with open(FILE, 'r') as f:
    data = f.read()
# end with

data = rtf_to_text(data)

results in the following error:

Traceback (most recent call last):
  File "/Users/luckydonald/Applications/PyCharm Community Edition.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
    coro = func()
  File "<input>", line 1, in <module>
  File "/Users/luckydonald/.local/share/virtualenvs/someproject-V9PMbazx/lib/python3.10/site-packages/striprtf/striprtf.py", line 169, in rtf_to_text
    out = out + chr(c).encode(encoding, errors)
  File "/opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/encodings/cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2794' in position 0: character maps to <undefined>

Note that \u2794 is just a rightward facing arrow, , see https://www.compart.com/en/unicode/U+2794

luckydonald commented 1 year ago

Updating from 0.0.20 to 0.0.26 fixed this.