Closed ffreller closed 1 year ago
@ffreller I have had similar issues and worked around most by opening the source file in binary mode and explicitly decoding it as utf-8 separately, then processing each line in turn rather than the whole file at once
with open(fOpenPath, 'rb') as rtfFile:
rawFileContent = rtfFile.read()
rawFileContent = rawFileContent.decode("utf-8")
for line in rawFileContent.splitlines():
fileContent += rtf_to_text(line, errors='backslashreplace')
No guarantees, but HTH
HI @ffreller, did you found a solution that works for you? Otherwise you can sent me the rtf files and I can have a look at it.
Yes, I believe I ended up finding a workaround. Thanks!
Em dom., 5 de fev. de 2023 às 17:13, Joshy Cyriac @.***> escreveu:
HI @ffreller https://github.com/ffreller, did you found a solution that works for you? Otherwise you can sent me the rtf files and I can have a look at it.
— Reply to this email directly, view it on GitHub https://github.com/joshy/striprtf/issues/30#issuecomment-1418253981, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYMDTN6PUD5ACLG6ERXDB3WWACYLANCNFSM5OEIE3TA . You are receiving this because you were mentioned.Message ID: @.***>
I'm having issues while decoding some characters. I get the following errors when decoding some rtf text: "'charmap' codec can't encode character '\x96' in position 0: character maps to",
"'charmap' codec can't encode character '\x93' in position 0: character maps to ",
"'charmap' codec can't encode character '\uf02d' in position 0: character maps to ",
"'charmap' codec can't encode character '\x99' in position 0: character maps to ",
"'charmap' codec can't encode character '\u25a1' in position 0: character maps to ",
"'charmap' codec can't encode character '\u2234' in position 0: character maps to ",
"'charmap' codec can't encode character '\x95' in position 0: character maps to "
If necessary, I can send you the rtf files that resulted in those errors.
Thank you