graspee / polutils

Automatically exported from code.google.com/p/polutils
Apache License 2.0
0 stars 0 forks source link

JP Encoding Issue #10

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
http://www.ffxiah.com/forum/topic/34702/wrong-kanji/#2113354

The OP correctly points out that the JP text is incorrect for the item he 
mentioned.  Which was "Winter Stone".  I know there are others that are 
incorrect and brought to my attention in the past.

Original issue reported on code.google.com by scr...@gmail.com on 5 Dec 2012 at 7:01

GoogleCodeExporter commented 9 years ago
I made all the code table data files by hand in a hex editor - it's a small 
miracle that there's not been more error cases (then again, I doubt there have 
been many Japanese users of POLUtils, so what errors there are would have gone 
unreported).

It should be easy to fix the 77D3 to 77F3 (or vice versa) - only trick will be 
finding out what the FFXI encoding bytes are (but a search on Winter Stone plus 
some debug prints should make it easy to find that out), since that would lead 
to the data file to change.

Original comment by tim.vanholder@gmail.com on 5 Dec 2012 at 9:51

GoogleCodeExporter commented 9 years ago
The table data files you are referring to, are those the ones in the 
ConversionTables folder?

Original comment by scr...@gmail.com on 6 Dec 2012 at 4:50

GoogleCodeExporter commented 9 years ago
Yes. They map a FFXI encoding to Unicode.
The FFXI encoding is basically Shift-JIS, with extensions for some of the 
special glyphs (like the element/day icons) and the autotrans stuff.
So those .dat files are basically Shift-JIS-to-UTF-8 tables. Table "0" can 
indicate (via value FFFE IIRC) that it's a multibyte starter, and then the 
other tables come into play.

I think I used http://msdn.microsoft.com/en-US/goglobal/cc305152 as a reference 
when making them. For example, 0x81 is a valid lead byte, so there's a 81xx 
table in POLUtils, based on http://msdn.microsoft.com/en-US/goglobal/gg638593.

So it would be a matter of going through those tables to find which shift-jis 
lead byte leads to U+77D3 and/or U+77F3 and then verifying the corresponding 
conversion table.

Original comment by tim.vanholder@gmail.com on 6 Dec 2012 at 8:35