joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
94 stars 27 forks source link

I get an error with while encoding character '\u200b'. #27

Closed leacardenas closed 1 year ago

leacardenas commented 2 years ago

I have been using the striprtf libraty and it has worked great! But, for some of the texts I that I am decoding I get the following error:

'charmap' codec can't encode character '\u200b' in position 0: character maps to

And I have tried mutliple things to be able to replace, encode or ignore the '\u200b' character but I coudn't. So I wanted to report the issue, since the library works very good.

I attached a txt version of the rtf file, since rtf is not accepted by GitHub.

9379.txt

joshy commented 2 years ago

Hi, there is now an option to ignore errors. Please try it with: rtf_to_text(your_string, errors="ignore") It was working for me, but I don't know if relevant text is stripped away.

leacardenas commented 2 years ago

Hi, yeah I saw that option, but I understand that indeed the text with the flaw is stripped away.

I will check still if that charater is on the text or the rtf as such.