joniles / rtfparserkit

Primary repository for RTF Parser Kit library
Apache License 2.0
104 stars 42 forks source link

Update encoding of 932 to MS932 to support NEC special characters #25

Closed manoj535 closed 3 years ago

manoj535 commented 3 years ago

With current mapping of LOCALEID_MAPPING.put("932", "SJIS"); NEC characters like ㎜,①,② are not decoding properly. Check https://en.wikipedia.org/wiki/JIS_X_0208#0x2D for NEC special characters

As, MS932 supports NEC special characters including SJIS , modified the current mapping of 932 to MS932 as LOCALEID_MAPPING.put("932", "MS932");

This change fixes the issue with decoding of NEC special characters in rtfparser

joniles commented 3 years ago

Many thanks for the suggestion, I will get this into the code in the next few days. Would you be able to construct a sample RTF file which contains the relevant characters so I can add it as a test case?

manoj535 commented 3 years ago

Thanks for the response. Please find below the rtf string and corresponding decoding 1) rtf = "{\rtf1\ansi\ansicpg932\deff0\deflang1033\deflangfe1041{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}{\f1\froman\fprq1\fcharset128 MS UI Gothic;}} {\colortbl ;\red255\green0\blue0;\red0\green0\blue255;} \viewkind4\uc1\pard\cf1\lang1041\f0\fs17 BLC U=>L Splice \f1\fs18\'82\'c5U/W No.2 Dancer\'82\'a9\'82\'e7\'83\'56\'83\'8f\'94\'ad\'90\'b6\'81\'42Set\'8e\'9e\'82\'c9\'95\'5c\'91\'7710\'87\'6f\'82\'d9\'82\'c7\'83\'80\'81\'5b\'83\'6a\'83\'93\'83\'4f\'81\'40pallet\'92\'ea\'82\'ccRoll\cf2\f0\fs17 \par }" decoded string = " BLC U=>L Splice でU/W No.2 Dancerからシワ発生。Set時に表層10㎜ほどムーニング pallet底のRoll" ㎜ -> NEC special character here

joniles commented 3 years ago

Thanks for that. I didn't merge your branch directly in the end, but the change and related test case are now in place. I've credited you in the release not on GitHub hope that's ok!

manoj535 commented 3 years ago

Thats fine. Thanks for merging the changes.