joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
94 stars 27 forks source link

Error while decoding characters #30

Closed ffreller closed 1 year ago

ffreller commented 2 years ago

I'm having issues while decoding some characters. I get the following errors when decoding some rtf text: "'charmap' codec can't encode character '\x96' in position 0: character maps to ", "'charmap' codec can't encode character '\x93' in position 0: character maps to ", "'charmap' codec can't encode character '\uf02d' in position 0: character maps to ", "'charmap' codec can't encode character '\x99' in position 0: character maps to ", "'charmap' codec can't encode character '\u25a1' in position 0: character maps to ", "'charmap' codec can't encode character '\u2234' in position 0: character maps to ", "'charmap' codec can't encode character '\x95' in position 0: character maps to "

If necessary, I can send you the rtf files that resulted in those errors.

Thank you

JulianSMoore commented 1 year ago

@ffreller I have had similar issues and worked around most by opening the source file in binary mode and explicitly decoding it as utf-8 separately, then processing each line in turn rather than the whole file at once

with open(fOpenPath, 'rb') as rtfFile:
    rawFileContent = rtfFile.read()
    rawFileContent = rawFileContent.decode("utf-8")
    for line in rawFileContent.splitlines():
        fileContent += rtf_to_text(line, errors='backslashreplace')

No guarantees, but HTH

joshy commented 1 year ago

HI @ffreller, did you found a solution that works for you? Otherwise you can sent me the rtf files and I can have a look at it.

ffreller commented 1 year ago

Yes, I believe I ended up finding a workaround. Thanks!

Em dom., 5 de fev. de 2023 às 17:13, Joshy Cyriac @.***> escreveu:

HI @ffreller https://github.com/ffreller, did you found a solution that works for you? Otherwise you can sent me the rtf files and I can have a look at it.

— Reply to this email directly, view it on GitHub https://github.com/joshy/striprtf/issues/30#issuecomment-1418253981, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYMDTN6PUD5ACLG6ERXDB3WWACYLANCNFSM5OEIE3TA . You are receiving this because you were mentioned.Message ID: @.***>