joniles / rtfparserkit

Primary repository for RTF Parser Kit library
Apache License 2.0
104 stars 42 forks source link

cpg command not superceding fcharset command #22

Closed abLoftware closed 4 years ago

abLoftware commented 4 years ago

Japanese_UTF8.rtf.txt Japanese_JIS.rtf.txt

It looks like fcharset is not being over-ridden when cpg is also provided, From RTF Specification version 1.9.1 pg 20: "If the \cpgN does appear, it supersedes the code page corresponding to the \fcharsetN."

test cases attached TestBug2.java.txt

joniles commented 4 years ago

Thanks for the bug report. Could you provide the two samples as standalone RTF files which I can open with Microsoft Word or Wordpad? Thanks!

abLoftware commented 4 years ago

Hi Jon, Thanks for taking a look at it! I’ve attached 2 RTF files, one for JIS and one for UTF8

Note that if you do a save after opening it in WordPad then WordPad Will strip out “\fcharset128\cpg65001” and change the hex to Unicode For the UTF8 file

From: Jon Iles notifications@github.com Sent: Friday, March 20, 2020 9:20 AM To: joniles/rtfparserkit rtfparserkit@noreply.github.com Cc: Andre Boutin ABoutin@loftware.com; Author author@noreply.github.com Subject: Re: [joniles/rtfparserkit] cpg command not superceding fcharset command (#22)

Thanks for the bug report. Could you provide the two samples as standalone RTF files which I can open with Microsoft Word or Wordpad? Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/joniles/rtfparserkit/issues/22#issuecomment-601696672, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIEWFCMEYXI42O4XJDGWSGDRINUQXANCNFSM4LOTGMAA.

joniles commented 4 years ago

H Andre, unfortunately I can't see any attachments - cold you link them directlyto the GitHub issue?

Thanks!

Jon

abLoftware commented 4 years ago

Hi Jon, I have attached them to the issue in GitHub

Thanks! Andre

From: Jon Iles notifications@github.com Sent: Saturday, March 21, 2020 11:54 AM To: joniles/rtfparserkit rtfparserkit@noreply.github.com Cc: Andre Boutin ABoutin@loftware.com; Author author@noreply.github.com Subject: Re: [joniles/rtfparserkit] cpg command not superceding fcharset command (#22)

H Andre, unfortunately I can't see any attachments - cold you link them directlyto the GitHub issue?

Thanks!

Jon

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/joniles/rtfparserkit/issues/22#issuecomment-602063738, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIEWFCJCHBJ4RT4ZDRM3JUDRITPK5ANCNFSM4LOTGMAA.

joniles commented 4 years ago

I've had a chance to take a quick look. Before I make any changes to the code, I wanted to validate what Microsoft products made of the sample RTF files you provided.

This is what Wordpad makes of the JIS file image and here's what Wordpad makes of the UTF8 file image Here's what Word makes of the JIS file image and here's what Word makes of the UTF8 image

Based on these results I'm inclined to think that the UTF8 version of the file isn't correct as it stands. If we can get to the point with the UTF8 file where it renders consistently when opened in a Microsoft product and uses the cpg command, I can make a stab at getting the parser to work with it appropriately.

abLoftware commented 4 years ago

I am able to get consistent results with both word and wordpad, Being very careful not to make any changes within either since they will re-write how it is it stored

Note for each of the images below I had to add spaces so I could move the caret Out of the way so that the caret wouldn’t be in image, so I had to be sure not to save The file when closing it, which would then change the rtf as it originally was written In fact even though I SWEAR I did not save the UTF-8 yet it still seems to have been rewritten on me, and I ended up with something similar to your word with utf-8

Here is each file in wordpad/word being super careful that the file is not modified

Wordpad JIS image Word JIS image

Wordpad UTF-8 image

Word UTF-8 image

From: Jon Iles notifications@github.com Sent: Thursday, March 26, 2020 9:52 AM To: joniles/rtfparserkit rtfparserkit@noreply.github.com Cc: Andre Boutin ABoutin@loftware.com; Author author@noreply.github.com Subject: Re: [joniles/rtfparserkit] cpg command not superceding fcharset command (#22)

I've had a chance to take a quick look. Before I make any changes to the code, I wanted to validate what Microsoft products made of the sample RTF files you provided.

This is what Wordpad makes of the JIS file [image]https://user-images.githubusercontent.com/4912864/77653652-eb680980-6f67-11ea-83ac-408fe5925cfd.png and here's what Wordpad makes of the UTF8 file [image]https://user-images.githubusercontent.com/4912864/77653728-0470ba80-6f68-11ea-97e7-9be15e1fb6cd.png Here's what Word makes of the JIS file [image]https://user-images.githubusercontent.com/4912864/77653950-48fc5600-6f68-11ea-822e-cea424349129.png and here's what Word makes of the UTF8 [image]https://user-images.githubusercontent.com/4912864/77654047-6b8e6f00-6f68-11ea-82b3-591933e8bb48.png

Based on these results I'm inclined to think that the UTF8 version of the file isn't correct as it stands. If we can get to the point with the UTF8 file where it renders consistently when opened in a Microsoft product and uses the cpg command, I can make a stab at getting the parser to work with it appropriately.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/joniles/rtfparserkit/issues/22#issuecomment-604442620, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIEWFCN2XNY42ADOD2HH2MLRJNMWVANCNFSM4LOTGMAA.

joniles commented 4 years ago

Thanks for the reply. Unfortunately emailing responses back to this issue drops any embedded images or files. Can you add the images via the GitHub UI?

abLoftware commented 4 years ago

Should be updated now

From: Jon Iles notifications@github.com Sent: Thursday, March 26, 2020 10:24 AM To: joniles/rtfparserkit rtfparserkit@noreply.github.com Cc: Andre Boutin ABoutin@loftware.com; Author author@noreply.github.com Subject: Re: [joniles/rtfparserkit] cpg command not superceding fcharset command (#22)

Thanks for the reply. Unfortunately emailing responses back to this issue drops any embedded images or files. Can you add the images via the GitHub UI?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/joniles/rtfparserkit/issues/22#issuecomment-604459704, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIEWFCPKAT5FW7J3LRACDDLRJNQOXANCNFSM4LOTGMAA.

joniles commented 4 years ago

Hi! That was interesting, I got different results from Wordpad in Windows 8.1 and Windows 10 . I could see the files both rendering the same with the Windows 10 version. Anyway, I've applied a fix and released a new version - hopefully that'll work for you!

abLoftware commented 4 years ago

Awesome! Thanks for looking into it!

Andre

From: Jon Iles notifications@github.com Sent: Monday, March 30, 2020 5:39 AM To: joniles/rtfparserkit rtfparserkit@noreply.github.com Cc: Andre Boutin ABoutin@loftware.com; Author author@noreply.github.com Subject: Re: [joniles/rtfparserkit] cpg command not superceding fcharset command (#22)

Hi! That was interesting, I got different results from Wordpad in Windows 8.1 and Windows 10 . I could see the files both rendering the same with the Windows 10 version. Anyway, I've applied a fix and released a new version - hopefully that'll work for you!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/joniles/rtfparserkit/issues/22#issuecomment-605893558, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIEWFCJJH235T6YD2VBXICDRKBSEFANCNFSM4LOTGMAA.