Open aghster opened 2 years ago
Thanks for submitting. Are you able to share the complete email or RTF? You can email it to support@goldfynch.com.
Regarding the fix, I think the issue may be that the code page is actually set (to 0), but the check on line 120 is checking for a truthy value instead of a defined value. If the check was changed to check that cpg
is defined, then your decode
callback would get "cp0" as the encoding and you could then handle as you see fit. What do you think?
Thank you for your quick answer!
Are you able to share the complete email or RTF?
Unfortunately, I cannot share the complete email or RTF and I don't have another example in which the RTF contains \ansicpg0.
Regarding the fix, I think the issue may be that the code page is actually set (to 0), but the check on line 120 is checking for a truthy value instead of a defined value. If the check was changed to check that
cpg
is defined, then yourdecode
callback would get "cp0" as the encoding and you could then handle as you see fit. What do you think?
Thank you for your alternative suggestion of how to fix the issue. You're certainly right that the error should only be thrown if cpg
is undefined, and that 0 should be treated like any other codepage number. So if the check in line 121 is changed so as to check that cpg
is defined, I would consider this issue fixed.
Nevertheless, I still think that having an option to define a fallback codepage would in general be useful.
Hello,
rts-stream-parser throws an error "text with no codepage" when I try to decode an email that contains RTF starting with
{\\rtf1\\ansi\\ansicpg0\\fromhtml1\\deff0{\\fonttbl\r\n{\\f0\\fswiss\\fcharset0 Arial;}\r\n{\\f1\\fmodern\\fcharset0 Courier New;}\r\n{\\f2\\fnil\\fcharset0 Symbol;}\r\n{\\f3\\fmodern\\fcharset0 Courier New;}}\r\n\\uc1\\pard\\plain\\deftab360 \\f0\\fs24\r\n{\\*\\htmltag0 <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">}
.To my understanding, this is due to \ansicpg0, as this is not a valid codepage. However, I suggest that in such cases rts-stream-parser should not just throw an error and abort, but should fall back to a default codepage instead. A simple solution would be changing the following lines https://github.com/mazira/rtf-stream-parser/blob/3ec37609e256c0a0a91649f4145e44ed91e33003/src/ProcessTokens.ts#L111-L113 to
or simpler to
I chose
1252
as the default codepage. Ideally, though, the default codepage is not hard-coded, but can be set as an option ...