nickbabcock / Pdoxcl2Sharp

A Paradox Interactive general file parser
MIT License
39 stars 13 forks source link

Encoding of files on Mac/Linux #6

Closed nickbabcock closed 11 years ago

nickbabcock commented 11 years ago

Paradox normally encodes their files in Windows 1252; however I don't know if this format changes when the game is installed on Mac/Linux.

bucaneer commented 11 years ago

It is the same for Linux (CK2). A text editor may assume it is UTF-8, but only the Windows-1252 character set is recognised by the engine.

nickbabcock commented 11 years ago

A text editor may assume it is UTF-8

Maybe you can clarify more. In Windows-1252 š is 0x9A, but in UTF-8 is 0x161. One fits squarely into a byte, and the other one doesn't. Thus if I write š to a file (and this character does appear in the games), then there will be a one byte and two bytes of content respectively depending on the encoding. Why can a text editor assume it is UTF-8?

bucaneer commented 11 years ago

Damn, that's what I get for only half-checking before writing. As long as only ASCII characters are involved (and this is the case for many text files used by the game), the two encodings are interchangeable, and editors in Linux (geany in my case) will tend to default to UTF-8 - this is what I meant. But yes, files that use a larger character set (e.g. savegames) are actually encoded in Windows-1252.

nickbabcock commented 11 years ago

Don't worry I only know this because I've been burned multiple times. I'm closing this issue with the intention that all parsing will be done in Windows-1252 as that is supported by all the Paradox files.

Thank you for reporting the encoding on Linux.