satan53x / SExtractor

从GalGame脚本提取和导入文本
GNU General Public License v3.0
241 stars 15 forks source link

Translated message.dat is the same as the original (KaGuYa message.dat v3) #106

Closed JakoDel closed 2 months ago

JakoDel commented 2 months ago

Hi, thanks for your work, this tool is fantastic and as far as I can see the only one that supports properly the v3 message.dat (Well, at least I think it's v3 but everything i said applies to v2 as well). this one comes from https://vndb.org/v6414.

After getting the transDic.output.json inside the "ctrl" folder (leaving regex rules intact) by copying the original message.dat in an empty directory and clicking extract I've just

  1. duplicated it
  2. renamed it to transDic.json
  3. replaced some sentences with random latin alphabet words without touching the json structure
  4. clicked on "extract" again.

This finally created a message.dat inside the "new" folder, but it's the exact same as the original one for some reason. Oh, and when creating the new message.dat, the log also isnt showing my edits. Do you happen to know what I could be doing wrong? Thank you.

original_message.zip

transDic.output.json transDic.json

P.S not attaching the generated message.dat since according to Beyond Compare 5 it's literally the same as the original bit-by-bit.

satan53x commented 2 months ago

You need to edit the value but not the key, such as: "「んにゃっ……?!」": "「Translated text……」",

BTW, you could change regex to skip voice string: 00_skip=^[a-zA-Z\x00-\x09]

JakoDel commented 2 months ago

You need to edit the value but not the key, such as: "「んにゃっ……?!」": "「Translated text……」",

BTW, you could change regex to skip voice string: 00_skip=^[a-zA-Z\x00-\x09]

Thanks a ton, that worked! Also yeah I changed the extract format to name,msgRN so that the translator I use can tell who is talking plus it doesnt have any extra strings like voice. Havent given it a try yet but I assume in this case the text needs to be replaced directly.

JakoDel commented 1 month ago

You need to edit the value but not the key, such as: "「んにゃっ……?!」": "「Translated text……」",

BTW, you could change regex to skip voice string: 00_skip=^[a-zA-Z\x00-\x09]

Hi, I couldnt work on this for a couple of weeks, now I've given it a proper try and while the json { orig:"" } ExtractFormat seems to work fine after a few changes listed below, I'm getting an error when using the extractformat I mentioned above. it shows up right after the first dialogue.

What I've done :

  1. change the 1st line of the reg expression to 00_skip=^[a-zA-Z\x00-\x09]
  2. change self.NewEncodeName in src/var_extract.py to 'cp65001' (which is utf8, it didnt like cp1252 for some reason but it may have been my fault) per #101
  3. extract and translate a small portion/write random latin characters and symbols
  4. create the new message.dat
  5. change the .inf file inside the game folder so that "Current" points to the local directory, aka ./ (left setuptype to 0 but it probably doesnt matter) per #101

What am I doing wrong here? Thank you mate. errorscreen

all.orig.json all.trans.json newmessage.zip ^^^the resulting message.dat is also big when using the name,msgRN format. 3770kb instead of 2695 of the original and 2741 when using the other format.

transDic.output.json transDic.json newmessage.zip

satan53x commented 1 month ago

101

This need to change the charset ofCreateFont funcion in the game exe. But you don't need to do that, the original charset is cp932(shift-jis) is compatible with ASCII, so it can show English. Just need to choose cp932 and Encoding applies to BIN when import.

JakoDel commented 1 month ago

101 This need to change the charset ofCreateFont funcion in the game exe. But you don't need to do that, the original charset is cp932(shift-jis) is compatible with ASCII, so it can show English. Just need to choose cp932 and Encoding applies to BIN when import.

yup that works, I apologize for the confusion and thank you. While I may be doing English now, I would also like to translate a few games into my "real" native language too, so that technique will surely come in handy in the future.

JakoDel commented 1 month ago

101 This need to change the charset ofCreateFont funcion in the game exe. But you don't need to do that, the original charset is cp932(shift-jis) is compatible with ASCII, so it can show English. Just need to choose cp932 and Encoding applies to BIN when import.

adding to my previous reply: I'm getting this error here when importing. 'cp932' codec can't encode character '\u2014' in position 0: illegal multibyte sequence but ignoring it doesnt seem to be causing any issues in the game so I'll see how it goes with a full translation.

satan53x commented 1 month ago

'cp932' codec can't encode character '\u2014' in position 0: illegal multibyte sequence

After this there should be a red error print to show you which text is wrong. You need to edit it to support cp932. \u2014 can be searched by editor such as VSCode with regex option.

JakoDel commented 1 month ago

'cp932' codec can't encode character '\u2014' in position 0: illegal multibyte sequence

After this there should be a red error print to show you which text is wrong. You need to edit it to support cp932. \u2014 can be searched by editor such as VSCode with regex option.

Thank you, I've done as you said now there are no more errors. Finally, my last issue is figuring out how to make text fit. The game doesn't let you change the font unfortunately (maybe the font is specified inside params.dat?) and by default it's so big that I need to write the \n pretty often, going out of bounds when it exceeds 3 lines. I saw that there are some settings in the config.ini file but 'toFullWidth' only made it even worse. if you've got any suggestions I'd appreciate it.

satan53x commented 1 month ago

If it's name,msg json you can use tools/limit_maxlen_json_gt.py to re-split every message. Maybe you need to change the var MaxLen to 52 around when text are all half-width. But the defect is that it will interrupt words, maybe you should edit the python script to make it only add \n after space to keep words intect.

JakoDel commented 1 month ago

If it's name,msg json you can use tools/limit_maxlen_json_gt.py to re-split every message. Maybe you need to change the var MaxLen to 52 around when text are all half-width. But the defect is that it will interrupt words, maybe you should edit the python script to make it only add \n after space to keep words intect.

Done limit_maxlen_json_gt.zip and now it's working flawlessly, thank you again. let me know if you want me to open a proper merge request, if this edit is not harmful to Chinese translations that is.

satan53x commented 1 month ago

No need, thx. CJK words have no space so cannot re-split by space. Otherwise it will recognize the whole paragraph or sentence as one word.