Closed powo closed 1 year ago
Do you have an example .rtf
file that illustrates your problem?
Here is an example:
>>> striprtf.rtf_to_text(r"{\rtf1\ansi\ansicpg0 T\'e4st}", encoding="utf-8", errors="replace")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/powo/Sync/dev/bat/.venv/lib/python3.11/site-packages/striprtf/striprtf.py", line 136, in rtf_to_text
out += bytes.fromhex(hexes).decode(encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 0: unexpected end of data
expected behavior would be, that the errors="replace"
will ignore the error and replace the invalid character, like this:
>>> b'T\xe4st'.decode("utf-8", errors="replace")
'T�st'
The
errors=
Parameter tortf_to_text
is documented in docstrings and mentioned in several issues (#34, #27, #27) but it is completely ignored and not being passed to.decode(..)
... therefore leading toUnicodeDecodeErrors
s.