satan53x / SExtractor

从GalGame脚本提取和导入文本
GNU General Public License v3.0
230 stars 15 forks source link

Extract Scenario from Cyberworks #46

Closed sakurahana90 closed 5 months ago

sakurahana90 commented 7 months ago

I tried to extract the game scenario using default setting and the default regex, Cyberworks JIS 10_search=^(?P【.+?】) 15_search=^(?P.+?)[\xFE]{0,1}$

it will throw error "UnicodeDecodeError: 'cp932' codec can't decode byte 0x84 in position 1935: illegal multibyte sequence decoding with 'cp932' codec failed"

Then, I tried to change "?" with "?" Screenshot_15

It went to the end, but the output only "{}" in every output file

Screenshot_14

Did I use wrong setting somewhere?

satan53x commented 7 months ago

Some Cyberworks use UTF-16 encoding. Try to choose regex None to extract. Upload some a0 files then I will check it out if it doesn't work too.

sakurahana90 commented 7 months ago

regex none also didn't work unfortunately...

here's some script text.zip

satan53x commented 7 months ago

Cannot download the zip, error 404.

sakurahana90 commented 7 months ago

I tried to upload but still 404 not found, how about google drive link ?

satan53x commented 7 months ago

I check it out, it's another Cyberworks format without encryption. You can use VNT to extract.

sakurahana90 commented 7 months ago

I also tried VNT to extract but when I put it in the script again and repack them, it didn't work and crashes the game when booting up. Is it about limitation of characters inside the string?

satan53x commented 7 months ago

Generally speaking, Cyberworks does not have a character limit, but it's possible that your version of the engine does. You can try extracting it and adding a few words in the first sentence. If it crashes, try modifying the text without increasing the character count to determine if that's the issue. (Both addition and modification are done using Japanese to eliminate any potential interference factors.)

sakurahana90 commented 7 months ago

I'm stuck, I tried adding a few character and also tried to change one character, both have the same result, the game crashed. I also tried with no change in script, just unpack then repack again, the game plays normally. Just that confirm nothing's wrong with unpack and repack tool.

I am using VNT to extract and reinserting the script, is unencrypted script not supported in SExtractor?

satan53x commented 7 months ago

Yes, it's not supported. Not only is it unencrypted, but the file structure also differs.

also tried to change one character

What language is it changed to? Will it still be Japanese after the modification? Don't use English for now.

sakurahana90 commented 7 months ago

yes, I used japanase, to be exact, I just copied one character from the string and paste it to the same string, or replaced one character to another character in the same string.

I also have tried to compare the extracted string and string that shown in the game and I found some string didn't get extracted, perhaps it's one of the reason too.

Well, I have to be patient then. Thank you for all the answer, I appreciate it really.

satan53x commented 7 months ago

That's odd. What's the name of the game?

sakurahana90 commented 7 months ago

It's an old game from Tinkerbell https://vndb.org/v1036

in case you want to check it out, here's the file game (this is the official patch and including the scenario archive)

yumekoi.zip

satan53x commented 7 months ago

Support for the old version of Cyberworks has been added. The recommended regex is as follows:

00_skip=^$
10_search=^(?P<name>【.+?】)[\xFE]{0,1}$
11_search=^(?P<name>【.+?】)(.+?)[\xFE]{0,1}$
15_search=^(?P<unfinish>.+?)[\xFE]{0,1}$
extraData=readJIS,noTextLen
structure=paragraph

Mainly, the addition of the noTextLen parameter is needed.

satan53x commented 7 months ago

image And it's also not to limit the text length. If you use CSystemArc.exe to repack, pay attention to the file version prompted during unpacking. Your game is version 22.

sakurahana90 commented 7 months ago

thank you so much for this, it's getting closer.

I tried using the recommended regex and the process went smoothly but there's problem when the generated file only 1 KB for each file (I'm using multiple file for extracting), there's a string there but only one line.

Screenshot_18

satan53x commented 7 months ago

image I can extract it normally. I'm not sure what problem you're encountering.

satan53x commented 7 months ago

What does your console print when extract?

sakurahana90 commented 7 months ago

Screenshot_19

I also tried extracting script using GARBro but same result, maybe my os is corrupt or something...

satan53x commented 7 months ago

Can you pack the entire SExtractor folder and upload it?

sakurahana90 commented 7 months ago

Finally I found the problem, my AV seemed to delete Injector Xenos.exe. I found out after tried to re-download SExtractor dan unzip it, and there's notification of something being detected and then deleted. After I turn it off, I can see the right generated file.

Screenshot_20

Thank you so much for the help!

satan53x commented 7 months ago

It's imported with GBK encoding in Cyberworks engine. If you only need Japanese and English, select Encoding applies to BIN and choose cp932 at the bottom right.

sakurahana90 commented 7 months ago

I've tried it and the game still won't boot up. Here's my method:

  1. Unpack dat archive with cyberworks tools
  2. Run SExtractor to extract the script
  3. Copy one json from orig folder to trans, for example a000002.json
  4. Edit json file in trans folder
  5. Run SExtractor to import translated lines with "select Encoding applies to BIN and choose cp932 at the bottom right" and the resulted script is generated in new folder
  6. I move the generated script outside the folder and overwrite the old script.
  7. I deleted 4 folder (ctrl, new, orig, trans)
  8. Run pack.bat with version 22

Is there something wrong with the way I'm using the tools? I also tried to change the administrative locale of my pc to japanese but no changes.

sakurahana90 commented 7 months ago

how silly of me, I don't think that there also some changes on Arc01.dat so I didn't copy it to the game folder, after I copy it too, the game can boot up.

Screenshot_21

Thank you very much for the help, I'm so happy that I finally can translate this game.

sakurahana90 commented 7 months ago

If I may adding something I found, there are some lines that didn't get extracted into json. I found it after compared with json I extracted with VNT, and if I tried to add the missing dialogue to the json, nothing is changed, and the game keeps rendering the original line.

Screenshot_22

There's also some symbol turned into a random character, for example the dialogue in the picture 「Huh? Isn't that the same thing?」, the symbol 」 changed into dot but I don't think this is quite problematic, at least for me.

Edited: I found the second problem, the first and last script of the archive contains choice text of the game, and if I translate them, it won't work and unable to click in game, but when I use original file, which is japanese, it worked again.

Screenshot_23

I don't know if I edited the file wrongly but I did the same like the rest.

satan53x commented 7 months ago
00_skip=^$
10_search=^(?P<name>【.+?】)[\xFE]{0,1}$
15_search=^([\S\s]+?)[\xFE]{0,1}$
extraData=readJIS,noTextLen
structure=paragraph

I checked and it seems that not all 【】 brackets contain the names of the speakers; there are also names in narration. (So delete the 11_search) Additionally, there are control bytes inside, so the . needs to be changed to [\S\s] to ensure it can be matched. (Its byte is 0x10, which happens to be equivalent to \n that . cannont match)

"message": "舌を突き出して汗をダラダラかいてるこいつは、【\nさとなか しゅうへい\u0005里中 秀平】。"

The control byte represents the phonetic transcription of the name. You can delete the phonetic transcription. If you want to keep it, you need to modify it to correspond to the length of the text. ( is the start bytes, \n (\u0010) and \u0005 is the text length) It's recommended to delete it, just keep the name. Because your translation and phonetic symbols are not match.

"message": "舌を突き出して汗をダラダラかいてるこいつは、【里中 秀平】。"
HOKORISAMA commented 7 months ago

Hi bro, I was also trying to extract a game of cyberworks named-https://vndb.org/v1078 everything works fine but the game is showing the original japanese text rather than English-- I think there is some problem with exporting of the .a0- file. Also can you merge these new reagex pattern in your tool as it works for old games of Cyberworks, And I have tried vntextpatch but the files exported from VNTextPatch crashes the game after showing a single translated line.

satan53x commented 7 months ago

everything works fine but the game is showing the original japanese text rather than English--

  1. Does the game have any DLC? If so, it will prioritize reading the DLC. DLC is generally located in subfolders rather than in the root directory.
  2. Was the import successful? The console should only print warnings in yellow or red if there are any. If it still doesn't work, pack the Extract Dir and upload it.
sakurahana90 commented 7 months ago
00_skip=^$
10_search=^(?P<name>【.+?】)[\xFE]{0,1}$
15_search=^([\S\s]+?)[\xFE]{0,1}$
extraData=readJIS,noTextLen
structure=paragraph

I checked and it seems that not all 【】 brackets contain the names of the speakers; there are also names in narration. (So delete the 11_search) Additionally, there are control bytes inside, so the . needs to be changed to [\S\s] to ensure it can be matched. (Its byte is 0x10, which happens to be equivalent to \n that . cannont match)

"message": "舌を突き出して汗をダラダラかいてるこいつは、【\nさとなか しゅうへい\u0005里中 秀平】。"

The control byte represents the phonetic transcription of the name. You can delete the phonetic transcription. If you want to keep it, you need to modify it to correspond to the length of the text. ( is the start bytes, \n (\u0010) and \u0005 is the text length) It's recommended to delete it, just keep the name. Because your translation and phonetic symbols are not match.

"message": "舌を突き出して汗をダラダラかいてるこいつは、【里中 秀平】。"

Thank you so much for the pointers. It's going great with the script. Perhaps the rest is how to translate the choice lines since translating them normally will end up with not clickable choice in the game. For now I use the original script for filler.

HOKORISAMA commented 7 months ago

input.zip Here's the files any import was successful. the 000004.a0 is the file that displays the text at the opening o the game.

satan53x commented 7 months ago

input.zip Here's the files any import was successful. the 000004.a0 is the file that displays the text at the opening o the game.

image

The extracted JSON from the folder you sent seems to be incorrect. I'm not sure why, as our regular expressions should be the same, right? (Choose __Custom0 and paste the regex there. It can save last regex after extract)

HOKORISAMA commented 7 months ago

After checking Encoding applies to bin it gives this error-- image and when i downloaded your new repository and ran the command run.bat , I have got some error like syntax error after that Se extractor started fine and the json files are still same , should i try to use a fresh SE extractor repository.

satan53x commented 7 months ago

TXT Encoding choose cp932, it's the alias for shift-jis.

HOKORISAMA commented 7 months ago

The error for first boot for se-extractor- E:\SE EXTRACTOR\SExtractor-main\src\var_extract.py:53: SyntaxWarning: invalid escape sequence '\.' symbolPattern = '[\.~ \\u3000-\\u303F\\uFF00-\\uFF65\\u2000-\\u206F\\u2600-\\u27FF]' #重新分割匹配字符 New Config mainDirPath . New Config engineCode 0 New Config outputFormat 0 New Config outputPartMode 0 New Config mergeDirPath . New Config mergeSkipReg ^[a-zA-Z0-9{] New Config collectSep + New Config regIndex 0 New Config encodeIndex 0 New Config maxCountPerLine 512 New Config splitParaSep \r\n New Config cutoff False New Config cutoffCopy True New Config splitAuto False New Config ignoreSameLineCount False New Config ignoreNotMaxCount False New Config fixedMaxPerLine False New Config pureText False New Config transReplace True New Config preReplace False New Config skipIgnoreCtrl False New Config skipIgnoreUnfinish False New Config ignoreEmptyFile True

HOKORISAMA commented 7 months ago

Everything WORKED out. THANKS.

satan53x commented 7 months ago

What's your python version? Requires 3.9 and above, recommended is 3.11.

HOKORISAMA commented 7 months ago

Comment

Python 3.12

satan53x commented 7 months ago

That's odd. Don't know why.

HOKORISAMA commented 7 months ago

That's odd. Don't know why.

Everything is working out fine now, Thanks.

HOKORISAMA commented 7 months ago

I was just asking that can you fix csystemarc.exe as it do not extracts ARC00.dat and gives error like-

E:\SE EXTRACTOR\Cyberworks>.\CSystemArc.exe readconfig .\Arc00.dat config.xml UnpackItems: item not start with 'S' UnpackItems: item not start with 'S' Found invalid data while decoding.

satan53x commented 7 months ago

There's also some symbol turned into a random character, for example the dialogue in the picture 「Huh? Isn't that the same thing?」, the symbol 」 changed into dot but I don't think this is quite problematic, at least for me.

Perhaps because the game is read in full-width characters, the English translation requires double-byte counts between English and Japanese full-width symbols. Such as 「Huh? Isn't that the same thing?」, the English text length is 31, just try to add a space. 「Huh? Isn't that the same thing ?」

satan53x commented 7 months ago

I was just asking that can you fix csystemarc.exe as it do not extracts ARC00.dat and gives error like-

E:\SE EXTRACTOR\Cyberworks>.\CSystemArc.exe readconfig .\Arc00.dat config.xml UnpackItems: item not start with 'S' UnpackItems: item not start with 'S' Found invalid data while decoding.

This tool is a modified version in SE and is used to read UTF-16. You can just use the original version with shift-jis. https://github.com/satan53x/SExtractor/blob/main/tools/Cyberworks/README.md

HOKORISAMA commented 7 months ago

I was just asking that can you fix csystemarc.exe as it do not extracts ARC00.dat and gives error like- E:\SE EXTRACTOR\Cyberworks>.\CSystemArc.exe readconfig .\Arc00.dat config.xml UnpackItems: item not start with 'S' UnpackItems: item not start with 'S' Found invalid data while decoding.

This tool is a modified version in SE and is used to read UTF-16. You can just use the original version with shift-jis. https://github.com/satan53x/SExtractor/blob/main/tools/Cyberworks/README.md

That version too gives error about found invalid data while ecoding.

satan53x commented 7 months ago

You sure? I use original version normally. https://github.com/arcusmaximus/CSystemTools/releases/tag/1.1

HOKORISAMA commented 7 months ago

Try this--file Arc00.zip

satan53x commented 7 months ago

https://github.com/satan53x/SExtractor/blob/main/tools/Cyberworks The original version has been modified and use CSystemArc_JIS.exe.

HOKORISAMA commented 7 months ago

https://github.com/satan53x/SExtractor/blob/main/tools/Cyberworks The original version has been modified and use CSystemArc_JIS.exe.

Thankyou very much, I'll check it and tell you about how's it working.

HOKORISAMA commented 7 months ago

Se Extractor is unable to extract the following cyberworks script-- gaiden.zip

image

satan53x commented 7 months ago

The structure is slightly different, with 4 bytes of 00 at the beginning of each line. Maybe this regex can extract that add \x00{4}.

00_skip=^$
10_search=^\x00{4}(?P<name>【.+?】)
15_search=^\x00{4}(?P<unfinish>[\S\s]+?)[\xFE]{0,1}$
extraData=readJIS,noTextLen
structure=paragraph

But your text had been translated by shift-jis tunnel way. shift-jis tunnel use illegal shift-jis bytes such as 81 01 to show expanded charactor, and them can't be decode to text. So you should extract the original japanese script but not translated one.

HOKORISAMA commented 7 months ago

The structure is slightly different, with 4 bytes of 00 at the beginning of each line. Maybe this regex can extract that add \x00{4}.

00_skip=^$
10_search=^\x00{4}(?P<name>【.+?】)
15_search=^\x00{4}(?P<unfinish>[\S\s]+?)[\xFE]{0,1}$
extraData=readJIS,noTextLen
structure=paragraph

But your text had been translated by shift-jis tunnel way. shift-jis tunnel use illegal shift-jis bytes such as 81 01 to show expanded charactor, and them can't be decode to text. So you should extract the original japanese script but not translated one.

Okay

HOKORISAMA commented 7 months ago

But it's still not extracting any text. image