satan53x / SExtractor

从GalGame脚本提取和导入文本
GNU General Public License v3.0
241 stars 16 forks source link

关于GIGA社NeXAS引擎导出name:message格式脚本的问题 #117

Closed NoneMore closed 6 days ago

NoneMore commented 4 weeks ago

NeXAS引擎的脚本并没有加密,文件头有指示文本从哪里开始,文本结束也有明显的标识,但由于NeXAS引擎本身从脚本中读取角色名的逻辑就十分复杂,根据,现有预设正则尝试提取name:message格式的脚本基本是不可用的

据我观察,角色名对应文本的逻辑大致有以下三种:

1) 对主角, 在第一次对话(即"「」"包裹的内容,对应ShiftJIS编码为81 7581 76)前用分割符(00),指示出其名字,如以下范例 { "name": " 駿介 ", "message": "「女の子が、泣いてる?」" } 其二进制脚本为 00 20 8F 78 89 EE 20 00 81 75 ... ... 81 76 其中8F 78 89 EE即为需要提取的男主名字, 20为空格,在游戏中显示 而之后再出现其对话时,脚本不再指示角色名,而是自动沿用之前指示的角色名 如二进制脚本 00 81 75 ...... 81 76 00 此时在游戏中,对话仍显示男主的角色名,但预设正则无法提取 2) 对非主角, 同样在其第一次对话前指示出名字,且分隔符后紧跟播放语音的控制符,如以下范例 { "name": "女の子の泣き声", "message": "「うっ……うっ……えくっ、えくっ」" } 其二进制脚本为 00 8F 97 82 CC 8E 71 82 CC 8B 83 82 AB 90 BA 00 40 76 30 30 30 30 30 30 30 81 75 ... ... 81 76 00 其中8F 97 82 CC 8E 71 82 CC 8B 83 82 AB 90 BA即为需要提取的名字, 40 76 30 30 30 30 30 30 30为控制符@v0000000,其中控制符中数字前三位指示角色,即"000",此后除非需要对角色名进行变更,否则总是根据控制符为依据沿用此前指示的角色名,需要变更时仍然在对话前重新指示. 3) 类似2),但语音控制符比较特殊,如角色名"黑猫",对应的是@vCat

目前根据上述逻辑我已经写了从bin文件提取json的脚本,运行良好,由于保留控制符,重新写回的脚本也没有太大困难,但由于我对该项目不了解,所以只能打包脚本与测试样例上传以供参考 script_jp.zip

satan53x commented 3 weeks ago

行,下次遇到NeXAS的游戏试试。 没游戏试不了,不过看试了下脚本提取,好像少了一些旁白。不过确实名字都有了。 SE我试了下,这个bin确实格式和之前不一样,有些前边没名字,所以需要把pre_name删掉。

NoneMore commented 3 weeks ago

行,下次遇到NeXAS的游戏试试。 没游戏试不了,不过看试了下脚本提取,好像少了一些旁白。不过确实名字都有了。 SE我试了下,这个bin确实格式和之前不一样,有些前边没名字,所以需要把pre_name删掉。

缺旁白是忘了在format_text_to_json里遍历每一行没找到引号时就直接添加旁白了,在后面加个else就行,就是会多加一些图片文件或者人物名什么的,虽然再改一下判断条件也能处理掉,但是感觉写回会很麻烦,也没必要再处理了;此外在后面的有一些脚本里回忆中的对话用的是空心引号"『 ",所以在判断对话标识符的时候也要考虑进去

koukdw commented 3 weeks ago

Hi i can't speak chinese so i'll write in english hopefully you understand, i've reversed the script format 2 years ago, research here https://github.com/koukdw/Aquarium_tools/blob/main/research/fileformats.md

It might be a bit challenging to build something that support every script given how Entergram continually modify NeXAS. The simplest case should be pretty easy by parsing the script however.

The goal should be to find SetMessage(int char_no, int face_id, string name, string message) function index via some heuristic(parameter count, types, callCount and checking if the function index is within an acceptable range) and then to parse the script for the name and message parameters

Here's the simplest case of how the script add a parameter for future function call (the index will be different depending on the version of the engine because if they add functions before SetMessage it will increase the index number):

code:
...
00 00 00 00 00 00 00 00   SET_R0        => [R0] = 0
05 00 00 00 00 00 00 00   PARAM_INT     => char_no = 0
00 00 00 00 01 00 00 00   SET_R0        => [R0] = 1
04 00 00 00 00 00 00 00   PUSH          => push [R0] to the stack
00 00 00 00 FF FF FF FF   SET_R0        => [R0] = -1
06 00 00 00 01 00 00 00   POP           => pop stack into R1. [R1] = 1
0b 00 00 00 00 00 00 00   MUL           => [R0] = [R0] * [R1]  (thats just unoptimized code to negate a number)
05 00 00 00 00 00 00 00   PARAM_INT     => face_id = -1
00 00 00 00 06 00 00 00   SET_R0        => [R0] = 6    // index into string_table
05 00 00 00 01 00 00 00   PARAM_STRING  => name = " 駿介 "
00 00 00 00 07 00 00 00   SET_R0        => [R0] = 7    // index into string_table
05 00 00 00 01 00 00 00   PARAM_STRING  => message = "「女の子が、泣いてる?」"
07 00 00 00 6B 00 04 00   CALL          => Call function at index 0x6b (SetMessage), 4 parameters
...

string_table: 
[0]:  ""
[1]:  "最初に戻りました"
[2]:  "――女の子がお姫様にあこがれるように、@n男の子はお姫様を助ける騎士にあこがれる。"
[3]:  "そんな@r憧憬@どうけい@が、幼い冒険心を刺激したのかもしれない。"
[4]:  "女の子の泣き声"
[5]:  "@v0000000「うっ……うっ……えくっ、えくっ」"
[6]:  " 駿介 "
[7]:  "「女の子が、泣いてる?」"
...

Even with this it might be not enough because sometimes they concatenate string or do other operation. In later version they also added 2 other SetMessage (for the chat mode for example)

Cosetto commented 1 week ago

@koukdw Do you know which bytes determine the chatframe type in Aonatsu? image image I had unpacked Alpharom from the exe if you need. AonatsuU.zip

koukdw commented 1 week ago

@Cosetto i don't need the unpacked version. Giga published most of their non drm version of their exe before closing down https://web.archive.org/web/20230318123243/http://www.web-giga.com/support/trouble/index.html

Chatframe type depends on the line count global variable. Check the SetChatMessage function in Aonatsu line chinese trial with dnspy, it's basically that function but written in C#. image https://steamdb.info/app/2185800/ link to the game trial

global variables have 0x40000000 added to the index of the variable so according to my screenshot, we see that line_cnt is the very first variable so index 0, we add 0x40000000 then i searched:

 00 00 00 00 00 00 00 40  SET_R0
 08 00 00 00 00 00 00 00  LOAD the value of the variable at index located in R0 in R0
 0e 00 00 00 02 00 00 00  STORE into the variable at index 2 (cnt) the value in R0

image

But that's very hard to edit without tools, every instruction you might add will break the program because the jump and branch instruction use an index which is an absolute value from the start of the function, so let's say you add an instruction, you will also offset all the other instruction that follow which will break the program.