Change font and locale - Githubissues

imKota commented 4 years ago

pylivemaker version: master branch
Python version: 3.7.4
Operating System: mac os

Description

Hello, @pmrowla ! Tell me, is it possible to somehow change the font that is used by default to some other one, as well as change the locale to another?

pmrowla commented 4 years ago

Unfortunately, I can't help you when it comes to changing the default font. I know that it is possible to specify fonts in LiveMaker, since the original LM documentation has the tags to do it. But I have no samples of games that include their own fonts and that use the font tags, so I never bothered trying to reverse engineer/figure out the parts of the engine related to setting fonts in an LSB script section.

Also, the LiveMaker engine is hard coded to only use installed Windows system fonts which are flagged with support for Windows Shift-JIS (MS CP932), so as far as I know you cannot get a game to use any locale/encoding other than CP932.

It is at least possible to force LiveMaker 3 games to display half-width western characters instead of full-width by changing the value of PR_FONTCHANGEABLED for the in game text boxes, which should make your patched text look nicer in-game: https://pylivemaker.readthedocs.io/en/latest/usage.html#notes-for-translation-patches

imKota commented 4 years ago

It is at least possible to force LiveMaker 3 games to display half-width western characters instead of full-width by changing the value of PR_FONTCHANGEABLED for the in game text boxes, which should make your patched text look nicer in-game: https://pylivemaker.readthedocs.io/en/latest/usage.html#notes-for-translation-patches

This is strange, but it did not help

pmrowla commented 4 years ago

Hmm, maybe that parameter doesn't affect Cyrillic characters (I've never tested that case)? Can you try patching in some text with Latin characters just to make sure whether or not that is the issue?

It may also be that you need to edit that parameter for a different (or all) of the MesNew instances in that LSB file. Depending on the game, it's possible that the command 36 is not actually the message box corresponding to your in-game text.

imKota commented 4 years ago

in file メッセージボックス作成.lsb lines 8, 36, 61 have already been set to 0

pmrowla commented 4 years ago

Yeah, so the font width parameter is working since the english text shows up correctly. It may just be that the engine is hard coded to only do the half-width adjustment for ASCII characters then. I'll try taking a look into the actual engine code at some point, but I'm guessing there isn't much that can be done about the problem.

imKota commented 4 years ago

@pmrowla any news?

Yeah, so the font width parameter is working since the english text shows up correctly. It may just be that the engine is hard coded to only do the half-width adjustment for ASCII characters then. I'll try taking a look into the actual engine code at some point, but I'm guessing there isn't much that can be done about the problem.

pmrowla commented 4 years ago

@imKota I haven't spent too much time on reversing more of the engine code, but I'm pretty sure the answer is still that there's not much you can do about it.

imKota commented 4 years ago

@pmrowla

@imKota I haven't spent too much time on reversing more of the engine code, but I'm pretty sure the answer is still that there's not much you can do about it.

It's strange.. If run through NTLEA, then the font is displayed normally.

pmrowla commented 4 years ago

I'm not familiar with how NTLEA works, but I'm assuming they are hooking things at the windows api level? So you could try patching whatever calls NTLEA is hooking in your game exe. But reversing mostly takes a lot of time that I don't have at the moment, as I am busy with other work. But my ghidra project is available on the wiki: https://github.com/pmrowla/pylivemaker/wiki so someone else is welcome to look into it as well

Stefan311 commented 4 years ago

I examined the game engine with IDA. I noticed this here: codepage This constant is used in several places in Delphi VCL lib, and passed once as a reference. The number 932 (3A4h) reappears once again: codepage2 This constant is used in several font functions.

I would try to set this both constants to an other codepage, translate a game to this codepage and show what happens. My problem is: the only game I have is HUGE. I cannot do a full translation yet. Do one of you have a very small example project?

imKota commented 4 years ago

@Stefan311

Do one of you have a very small example project?

https://vndb.org/v15032

pmrowla commented 4 years ago

@Stefan311 I have a small game that I made myself following the LM tutorial w/around 5 total lines that I use for testing different things in pylm (it's where the lsb's used for the automated tests come from). And it already includes a mix of JP and ascii text lines.

pylm-test.zip

Stefan311 commented 4 years ago

Works! umlaute I have changed the values on file position 1777396 (0x161EF4) to 1252 (0x4E4). That's the western europa code page. patch I also changed the novel.py to encode as CP1252 and bypass CP932 checks.

class _TWdCharAdapter(construct.Adapter):
    # construct PaddedString only supports ascii and utf encodings

    def _decode(self, obj, ctx, path):
        try:
            ch = obj.to_bytes(2, byteorder="big").decode("cp932")
        except UnicodeEncodeError:
            try:
                ch = obj.to_bytes(2, byteorder="big").decode("cp1252")
            except UnicodeEncodeError:
                raise BadLnsError("'{}' is not a valid CP932 or CP1252 character".format(ch))
        if ch.startswith("\x00"):
            ch = ch[1]
        return ch

    def _encode(self, obj, ctx, path):
        return int.from_bytes(obj.encode("cp1252"), byteorder="big")

pmrowla commented 4 years ago

Awsome stuff @Stefan311. If we can get this working with utf-16 (I think this will be preferred over utf-8 due to how TWd char by char packing is done), we can make #44 a priority, and just distribute our patched version of the engine for every game patched w/pylm.

Stefan311 commented 4 years ago

I fear we are bound to this old code page crap. but... https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers There is a codepage for utf-8... 65001 Just trying...

pmrowla commented 4 years ago

also for future reference, this constant is at offset 0x001c30f4 (gvar_001C30F4) in the shared ghidra project

pmrowla commented 4 years ago

utf-8 might require some hacks in our struct handling, since LM assumes everything will fit in a single 2-byte wchar. for utf-16, we can split 4-byte codepoints across two TWdChar's, but since utf-8 is variable width it gets a bit more complicated

Stefan311 commented 4 years ago

I see. Maybe code page 1200 / 1201? The M$ help says "available only to managed applications" but maybe...

Stefan311 commented 4 years ago

pages 1200 / 1201 / 65001 does not work :( Error message "Install game again!" Seems we are stuck to code pages

pmrowla commented 4 years ago

Just to clarify, when you get the error message for utf-8 (65001), is that after you replaced text and set the TWdChar encoding to utf-8?

because with just hexediting the constant to 65001 (but not manipulating any text), that does not crash for me.

I think it may still be possible to make it work, but just setting TWdChar encoding to utf-8 won't be enough, it will have to be manually packed to make sure that LM unpacks the bytes correctly

Stefan311 commented 4 years ago

I got the error message directly after the game starts. lmlsb does not work with "utf-8" as encoding, so I just use "utf-16be" also for the 65001 test. Funny if I use CP1252 in the EXE and utf-15be in the translation I get the german chars displayed, but the japanese "name" text is changed to unreadable latin characters.

Other idea: The code page constant is used to call MultiByteToWideChar. This function convert code paged multibyte text to utf-16. So, if we already store our text as utf-16, and just NOP-out the MultiByteToWideChar call, could this work? Edit: of coarse not only NOP-out the call itself, also need doing the stack-work. Edit2: I am currently reading the MultiByteToWideChar M$ help page. Maybe there is need to change the parameter dwFlags to work properly with utf-8.

pmrowla commented 4 years ago

I am able to set it to 65001 and run with packed utf-8 text, although it's obviously not being decoded properly in LM with what I'm currently trying:

Capture

basically TWdChar will have to be modified to only handle characters as 16-bit ints, and then at a higher level we have to convert them to/from text w/something like

                    raw = ch.encode("utf-8")
                    logger.info(f"packing {raw}")
                    if len(raw) <= 2:
                        logger.info(f"packing {raw} into one ch")
                        new_ch = int.from_bytes(raw, byteorder="big")
                        new_block.append(TWdChar(ch=new_ch, **d))
                    else:
                        logger.info(f"packing {raw} into double ch")
                        new_ch = int.from_bytes(raw[0:2], byteorder="big")
                        new_block.append(TWdChar(ch=new_ch, **d))
                        new_ch = int.from_bytes(raw[2:4], byteorder="big")
                        new_block.append(TWdChar(ch=new_ch, **d))

e: I think maybe we will have to pack things on full string/block level rather than per char?

Stefan311 commented 4 years ago

After patching the dwFlags parameter to 0, the game starts and displays something... Since the translation is still utf-16, there is a space between every latin char, and the japanese chars ar still wrong. Untitled

Stefan311 commented 4 years ago

What windows version do you use? This seems the thing where I failed first:

Note For UTF-8 or code page 54936 (GB18030, starting with Windows Vista), dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

For the internal pylm utf-8-to-byte encoding I am out. Assembler... no problem... but python? I am not knowing much about python.

pmrowla commented 4 years ago

I'm testing in windows 10

pmrowla commented 4 years ago

ok so with it set to 65001, and packing utf-8 by line, I'm able to get the following

Capture

where the game is displaying the equivalent of

>>> 'こんにちは'.encode('utf-8').decode('cp932')
'縺薙ｓ縺ｫ縺｡縺ｯ'

I think that it maybe possible to get utf-8 to work, but it's probably not something I will put too much time into (or at least not in the near future).

It is good to know that we can force other non-utf8 codepages to work, but that still doesn't help as far as making a general solution goes (just as an example, imKota would need a version of the engine and pylm that supports CP1251, whereas Stefan311 and LioMajor need 1252). Forcing CP932 isn't ideal, but it at least supports both latin + cyrillic alphabets, even if it lacks support for accented characters.

but obviously, help from anyone else w/time to RE and experiment would be great

Stefan311 commented 4 years ago

So your exe still decodes cp932 while the translation is utf-8. Interesting. Are sure you have set the 65001 correctly?

Could you share the utf-8 encoding patch on pylm?

pmrowla commented 4 years ago

I pushed my branch https://github.com/pmrowla/pylivemaker/compare/utf-8

Stefan311 commented 4 years ago

Something does not work in this branch

ID,Label,Context,Original text,Translated text
pylm:text:00000001.lsb:8:0,00000003,,"1237112435123951238512399
228246252196214220223
1071321085107732107510861074108610881102321087108632108810911089108910821080",
pylm:text:00000001.lsb:8:1,00000003,,879710511632102111114329910810599107,
pylm:text:00000001.lsb:8:2,00000003,,651021161011143211997105116,
pylm:text:00000001.lsb:8:3,00000003,,84101120116321151121011011003210297115116,
pylm:text:00000001.lsb:8:4,00000003,,841011201163211511210110110032115108111119,
pylm:text:00000001.lsb:8:5,00000003,,84101120116321151121011011003211011111410997108,

I gave up looking for a solution with UTF16. All that's left is UTF8 or classic code pages.

I would vote for classic code pages. Lets write a simple exe-patcher and add an option to chose the LSB encoding.

pmrowla commented 4 years ago

yeah the branch doesnt decode properly, I didn't add the stuff to extract/unpack from 16-bit ints into utf8. I was only testing inserting/packing utf8.

pmrowla commented 4 years ago

I gave up looking for a solution with UTF16. All that's left is UTF8 or classic code pages.

I would vote for classic code pages. Lets write a simple exe-patcher and add an option to chose the LSB encoding.

This is probably not something that can go into pylm proper unless we are sure it works for all the possible LM 2 + 3 interpreter versions (or until we have support for #44). But if you get stuff working in a fork for whichever specific engine versions you're looking at, I can link to it from the readme/docs/etc

edit: I added a list of things we will need for a general custom-codepage based solution in the other issue

Stefan311 commented 4 years ago

https://vndb.org/v15032

This game uses an older version (3.12.2.28), this engine version does not contain any code page constant. Seems the code page handling is a later development.

Is the 3.17.12.26 the last engine version?

pmrowla commented 4 years ago

Yeah, that’s the final release before the company shut down

Stefan311 commented 4 years ago

I machine translated "my" game to german, and found out this:

Menu's are also involved by the above patch
On systems with native japanese locale the patch does not work. Seems the livemaker engine uses two different display engines depending on system locale.

Could you try your utf8 test again on a english locale machine? (or send me the test file?) I have not managed to got the utf-8 branch to work. (btw. this can be deleted)

On patching localized menu stuff I have a problem on the pylm side:

Patching 000001A1.lsb ...
  Translated 2 choices
  Failed to translate 0 choices
  Ignored 0 untranslated choices
Backing up original LSB.
Could not generate new LSB file: Error in path (building) -> commands -> items
no subconstruct matched: Calc AddArray(_tmp, "Es sollte aufhören")

I am currently stuck on this issue, I have no idea how to debug this. Could you please...? pylm_diff.txt 000001A1.lsb.zip menu.csv.zip

pmrowla commented 4 years ago

@Stefan311 if you are translating Calc operand fields to use non-CP932 characters (meaning menu text), you need to tell the struct for string literal operand data to use the proper codepage:

https://github.com/pmrowla/pylivemaker/blob/6b562c3a400595dc1da0e467a4c3348365be1333/livemaker/lsb/core.py#L492

this may have unintended side effects - i.e. you will probably have to translate every string literal in your patched LSB files and not just the ones you care about, assuming that japanese characters cannot be encoded w/whatever codepage you are using

pmrowla commented 4 years ago

wrt to the utf-8 branch, I never got it to work either

pmrowla commented 4 years ago

On systems with native japanese locale the patch does not work. Seems the livemaker engine uses two different display engines depending on system locale.

I'm pretty sure this is expected behavior. Because of how windows codepages work, if you are patching a game to use a different locale, it will only work properly if windows is set to use the patch locale (either via the actual windows default locale setting, or via a locale emulator).

so to use CP1252 in a japanese configured windows installation, you'd have to use locale emulator set to CP1252

Stefan311 commented 4 years ago

Seems you haven't understand. I show my test results, maybe you can understand. The game menu is still not translated.

japanese locale system, game engine cp932, original content Window title correct, Text correct, Menu correct.

japanese locale system, game engine cp1252, original content Window title incorrect, Text correct, Menu correct.

japanese locale system, game engine cp932, translated content Window title correct, Text incorrect, Menu correct.

japanese locale system, game engine cp1252, translated content Window title incorrect, Text incorrect, Menu correct.

english locale system, game engine cp932, original content Window title incorrect ("??????"), Text correct, Menu correct.

english locale system, game engine cp1252, original content Window title incorrect, Text incorrect, Menu incorrect.

english locale system, game engine cp932, translated content Window title incorrect ("??????"), Text incorrect, Menu correct.

english locale system, game engine cp1252, translated content Window title incorrect, Text correct, Menu incorrect.

So my assumptions:

the window title is always affected by the game engine patch.
the game text and menu is only affected if the system locale is different to cp932.

pmrowla commented 4 years ago

hmm I see.

I'm not really surprised that there's issues with regard to changing locale, since the engine itself has hardcoded CP932 strings in it, and we are still mixing CP932 and strings in LSBs.

Stefan311 commented 4 years ago

this may have unintended side effects - i.e. you will probably have to translate every string literal in your patched LSB files and not just the ones you care about, assuming that japanese characters cannot be encoded w/whatever codepage you are using

    def _pascal_string_proxy(construct.Int32ul)
        try:
            return construct.PascalString(construct.Int32ul, "cp932")
        except:
            return construct.PascalString(construct.Int32ul, "cp1252")

    @classmethod
    def _struct(cls):
        return construct.Struct(
            "type" / construct.Enum(construct.Byte, ParamType),
            "value"
            / construct.Switch(
                construct.this.type,
                {
                    "Int": construct.Int32sl,
                    "Float": construct.ExprAdapter(
                        construct.Bytes(10),
                        lambda obj, ctx: numpy.frombuffer(obj.rjust(16, b"\x00"), dtype=numpy.longdouble),
                        lambda obj, ctx: numpy.longdouble(obj).tobytes()[-10:],
                    ),
                    "Flag": construct.Byte,
                    "Str": _pascal_string_proxy(construct.Int32ul),
                },
                # else 'Var' variable name type
                construct.Select(construct.PascalString(construct.Int32ul, "cp932"),),
            ),
        )

Is this kind of proxy method possible? To be honest, I still don't understand this whole "construct" thing. I start to really hate this esoteric python stuff.

pmrowla commented 4 years ago

construct is just a library for packing python types into binary structs. I think it should be possible to do it with an adapter

https://construct.readthedocs.io/en/latest/adapters.html

pmrowla commented 4 years ago

it could probably also just be a union? But I’m not actually sure how construct handles extracting/unpacking unions

Stefan311 commented 4 years ago

Works this way:

    def _struct(cls):
        macro = construct.PascalString(construct.Int32ul, "cp932")
        def _encode(obj, context, path):
            if obj == u"":
                return b""
            try:
                return obj.encode("cp932")
            except UnicodeEncodeError:
                return obj.encode("cp1252")
        macro._encode = _encode

        return construct.Struct(
            "type" / construct.Enum(construct.Byte, ParamType),
            "value"
            / construct.Switch(
                construct.this.type,
                {
                    "Int": construct.Int32sl,
                    "Float": construct.ExprAdapter(
                        construct.Bytes(10),
                        lambda obj, ctx: numpy.frombuffer(obj.rjust(16, b"\x00"), dtype=numpy.longdouble),
                        lambda obj, ctx: numpy.longdouble(obj).tobytes()[-10:],
                    ),
                    "Flag": construct.Byte,
                    "Str": macro,
                },
                # else 'Var' variable name type
                construct.Select(construct.PascalString(construct.Int32ul, "cp932"),),
            ),
        )

pmrowla commented 4 years ago

So I looked more into the engine code, and I think it may be possible to get utf-8 to work for string literals (in TLiveParser expressions), but not for scenario scripts (in TpWord blocks).

For string literals in expressions, strings are packed and unpacked as full byte arrays (stored as delphi ANSI strings), so the fact that utf-8 is variable width (up to 4-bytes) is not an issue, and the windows MBCS->UTF-16 conversion functions should actually work as expected (if the codepage is edited to CP_UTF8).

However, for TpWord blocks, text is always parsed by individual character (glyphs), and never as full strings. And for TWdChar glyphs, they are always unpacked as 2-byte uints and stored in arrays of TWdChar class instances, and are never handled as string/byte arrays. When rendering scenario text, they call gdi32 functions to retrieve font glyphs per individual character (not as strings). The 2-byte uint is always what is fed into the MBCS->UTF-16 conversion functions, so we can't actually get away with packing 3 or 4-byte utf-8 codepoints across two TWdChars (the engine will always try to render them as two separate codepoints/glyphs). In theory we could potentially try only supporting the unicode range covered by 2-byte utf-8 codepoints. CJK text falls outside this range so it's not ideal, but for latin and cyrillic text we would be covered in this range.

So basically, it's theoretically possible to get partial utf-8 support w/the LM engine, but in practical terms we may just want to stick with DBCS codepages because of the TpWord scenario script limitation.

Also there's a second codepage constant you probably need to hexedit (used in calls to IsDBCSLeadByteEx), it should be at offset 1776412 in the latest engine version

pmrowla commented 4 years ago

I'll probably play around with trying to get partial utf-8 support working over this weekend.

Stefan311 commented 4 years ago

I also messed around with utf-8 to menu items, but no luck at all. Neither Japanese characters nor German umlauts are displayed correctly. Maybe interesting for you:

There are more than one locale=jap detections: CODE:000D270C function return 1=jap 0=others, seems used for /config window CODE:001C06A6 seems to be a localized messagebox.
The window title is always set as ansi string, delphi uses the SetWindowTextA API call. So no utf-8 possible for this. See CODE:0005CF8C @TApplication@SetTitle
I have observed, the menues sometimes missing german umlauts, but this issue seems gone after changing the 1776412 you also found.
to make starting the game engine possible with utf-8 code page, some more code patching to the MultiByteToWideChar API calls is required. If you say you do not require this, maybe this API is changed in Win10 again (I still use win7).

Finally:

I have translated "my" game to german for both texts and menues, and set the constants on 1776412 and 1777396 to 1252. I have played the game for a while, and I experienced no crashes or game glitches. Everything was working. The only issues are the main/load/save/options menues, who are still not translated and wrong decoded now. But they are still functional.

pmrowla commented 4 years ago

Yeah, for the save/load/option menus you will have to translate the appropriate LSBs in ノベルシステム/システムメニュー

pmrowla commented 4 years ago

to make starting the game engine possible with utf-8 code page, some more code patching to the MultiByteToWideChar API calls is required. If you say you do not require this, maybe this API is changed in Win10 again (I still use win7).

If this does turn out to be a win 10 vs win 7 issue (not sure yet), I would honestly say it's ok for pylm to not support win 7, given that it is completely end of lifed by microsoft and no longer even receiving security updates. But I'll probably also need to test it in win 8/8.1 as well since they are still microsoft supported versions.

pmrowla commented 4 years ago

Ok, so after some more investigation, when rendering text, they only use the ansi string version of windows gdi font calls (gdi32.GetGlyphOutlineA), so glyph lookups only work properly when the text codepage matches the system codepage. The MBCS->wide/utf-16 conversion functions are only used when passing text into windows messaging API calls, but not for font glyph/rendering related API calls.

The reason utf-8 partially worked for me is because I had the experimental/beta "set system locale to utf-8" Windows option enabled, but that's not something we can depend on.

so we are pretty much limited to DBCS codepages, and I don't think it's worth any more effort to try and hack in utf-8 support.

Stefan311 commented 4 years ago

So we are finally drop the utf-8 thing. One more thing I would investigate is: Why does the game crash when running in wine? Seems this is also a localisation thing, the game intro and start menu works, but the game crashes when the first message text should appear. Maybe a font is missing? Do you already know something in this topic?

pmrowla / pylivemaker

Change font and locale #14

Description