vn-tools / arc_unpacker

CLI tool for extracting images and sounds from visual novels.
GNU General Public License v3.0
562 stars 83 forks source link

Game request: [BISHOP]Sansha Mendan #48

Closed thajunk closed 8 years ago

thajunk commented 8 years ago

https://vndb.org/v6357

Hello, my ultimate goal with this game is to be able to extract and inject the UI and script, however I am very new to this. I already understand that you don't do injection requests, but if you could get the script decoded and extracted that would be more than enough. If you can't help any tips or guides that have helped you in the past would be nice as well.

rr- commented 8 years ago

@thajunk from what I see, the scripts for this game are located in bsxx.dat. I'm afraid I can't be of much help when it comes to decoding this specific file. I'll add support for media, e.g. .bsa archives and .bsg images (the audio is stored as OGG.)

As for the script file: it looks like it's divided into few parts - opcodes (game logic expressed as 4 byte integers), followed by some unknown structures, followed by all the text. The text is encoded as UTF16 little endian. The file is unecrypted. Proof - you can pass the file through iconv - sample output of iconv -f=utf16le bsxx.dat -c:

ใฃ๐Ÿใ€ๆ„ๅ›ณใ‚’็†่งฃใ—ใ€ๆ€งๆ‚ชใช็ฌ‘ใฟใงใ†ใชใšใใ€‚ไฟบใŸใกใฏ่ฆ–็ทšใ‚’้ƒจๅฎคใซๆˆปใ—ใŸใ€‚ใ€Œใ ใ‚โ€ฆโ€ฆใชใ„ใฃโ€ฆโ€ฆ๏ผใ€€ใ“ใ“ใซใฏ้š ใ—ใฆใชใ„ใฃใฆใ„ใ†ใฎใฃโ€ฆโ€ฆ๏ผŸใ€ไธ€ๅบฆ่ฆ‹ใŸใฏใšใฎใƒญใƒƒใ‚ซใƒผใ‚’ไฝ•ๅบฆใ‚‚่ชฟใน็›ดใ—ใ€ๆ‚„็„ถใจใ—ใŸ้ก”ใงใคใถใ‚„ใใ€‚ใ‹ใชใ‚Šๅˆ‡็พฝ่ฉฐใพใฃใฆใ„ใ‚‹ใ‚ˆใ†ใ ใ€‚ใ€Œใ‚โ€ฆโ€ฆใ“ใฎ้ตโ€ฆโ€ฆ๏ผ ใ“ใฎ้ตใฏใพใ ใ€ไฝฟใฃใฆใชใ„ใฃโ€ฆโ€ฆใ€้ตๆŸใ‚’่ฆ‹ใคใ‚ใฆๅ‹•ใใ‚’ๆญขใ‚ใ‚‹ใจใ€ๅˆ้ŸณใฏๆœŸๅพ…ใฎใ“ใ‚‚ใฃใŸ็›ฎใง้ƒจๅฑ‹ใฎไธญใ‚’่ฆ‹ๆธกใ—ใŸใ€‚ใ€Œใ‚ใฎๆœบโ€ฆโ€ฆ๏ผใ€ใพใ ๆ‰‹ใ‚’ใคใ‘ใฆใ„ใชใ‹ใฃใŸใ€้ƒจๅฑ‹ใฎ้š…ใฎไบ‹ๅ‹™ๆœบใซๆ€ฅใ„ใง้ง†ใ‘ๅฏ„ใฃใฆใ„ใใ€‚ใ€Œใฉใฎใ€ๅผ•ใๅ‡บใ—โ€ฆโ€ฆใ‚ใฃใ€ใ“ใ“ใฃ๏ผใ€ใ‚ฌใƒใƒฃใ‚ฌใƒใƒฃใจ้Ÿณใ‚’ ใ‹ใ›ใฆใ€ๆœบใฎไธ€็•ชไธ‹ใซใ‚ใ‚‹ๅคงใใชๅผ•ใๅ‡บใ—ใธ้ตใ‚’ๅทฎใ—่พผใ‚€ใ€‚ใ€Œ้–‹ใ„ใŸใฃ๏ผใ€้ตใฎ้–‹ใ้Ÿณใซใ€ๅˆ้Ÿณใฏๆญ“ๅ–œใฎๅซใณใ‚’ใ‚ใ’ใŸใ€‚ใ€Œใ“ใ‚Œใ ใ‚ใ€้–“้•ใ„ใชใ„ใฃ๏ผใ€ใ‚ฌใƒฉใƒชใจๅผ•ใๅ‡บใ—ใ‚’้–‹ใ‘ใ€ใใฎไธญใ‹ใ‚‰็™บ่ฆ‹ใ—ใŸใ‚ซใƒกใƒฉใจใƒกใƒขใƒชใƒผใ‚ซใƒผใƒ‰ใ‚’ๅ–ใ‚ŠไธŠใ’ใฆใ€ๅฐ่บใ‚Šใ™ใ‚‹ใ‚ˆใ†ใซ่บซไฝ“ใ‚’ ใ‚‰ใ™ใ€‚ใƒ•ใƒ•โ€ฆโ€ฆใŠๅฎ็™บ่ฆ‹ใ€ใŠใ‚ใงใจใ•ใ‚“ใ€‚ใใ‚Œใ˜ใ‚ƒใ€ๅคฉๅ›ฝใ‹ใ‚‰ๅœฐ็„ใธๅ •ใกใฆใ‚‚ใ‚‰ใ†ใ‹โ€ฆโ€ฆ๏ผไฟบใฏๆˆ็พŽใซ็›ฎใงๅˆๅ›ณใ‚’้€ใ‚‹ใจใ€้ƒจๅฎคใฎๆˆธใซๆ‰‹ใ‚’ไผธใฐใ—ใŸใ€‚ใ€ŒใŠใ„ใŠใ„ใ€ๅญฆ็”Ÿไผš้•ทใ•ใ‚“ใ€‚้ƒจๅฎค่’ใ‚‰ใ—ใชใ‚“ใ–ใ€ใฉใ†ใ„ใ†ใคใ‚‚ใ‚Šใ ๏ผŸใ€ใ€Œใƒ’ใƒƒ๏ผ๏ผŸใ€€\Aใ•ใ‚“ใฃโ€ฆโ€ฆใฉใ†ใ—ใฆใ€ใ“ใ“ใซโ€ฆโ€ฆใฃ ๏ผŸใ€ใ€Œใƒใ‚คใ€ใƒใƒผใ‚บ๐Ÿใ€่’ใ‚‰ใ•ใ‚ŒใŸ้ƒจๅฎคใจๅ‡ใ‚Šใคใๅˆ้Ÿณใ‚’ใ€ไฟบใฎ่ƒŒๅพŒใ‹ใ‚‰็พใ‚ŒใŸๆˆ็พŽใŒๆบๅธฏใง้€ฃๅ†™ใ™ใ‚‹ใ€‚ใ€Œใ‚ใ†ใ†ใฃใ€ใ‚ใ‚ใฃใ€ใ‚ใฃโ€ฆโ€ฆใŸใŸใ€้ซ˜ๆดฅๅ…ˆ็”Ÿใƒƒโ€ฆโ€ฆ๏ผ๏ผŸใ€ใ‚ซใ‚ทใƒฃใƒƒใ€ใ‚ซใ‚ทใƒฃใƒƒใจๆบๅธฏใฎใ‚ซใƒกใƒฉใฎ้ŸณใŒ้Ÿฟใไธญใ€ๅˆ้Ÿณใฏๅ‘†็„ถใจใใฎๅ ดใง็ซ‹ใกๅฐฝใใ—ใŸใ€‚ใ€ŒใŠใƒผใŠใƒผใ€‚ใš ใ„ใถใ‚“ๆดพๆ‰‹ใซ่’ใ‚‰ใ—ใฆใใ‚ŒใŸใ‚‚ใ‚“ใ ใชใ€‚ใŠใพใ‘ใซใ€ใใฎๆ‰‹ใซๆŒใฃใฆใ„ใ‚‹ใƒขใƒŽโ€ฆโ€ฆใใ‚Œใ€ใฉใ†ใ™ใ‚‹ใคใ‚‚ใ‚Šใ ใฃใŸใ‚“ใ ๏ผŸใ€ใ€ŒๅคฉๅŸŽใ•ใ‚“โ€ฆโ€ฆใพใ•ใ‹็›—ใ‚€ใคใ‚‚ใ‚Šใ ใฃใŸใฎ๏ผŸใ€€ใใ‚Œใซใ€ใฉใ†ใ—ใฆ้ƒจๅฎคใซๅ…ฅใ‚ŒใŸใฎใ‹ใ—ใ‚‰โ€ฆโ€ฆ๏ผŸใ€€้ตใฏ็งใŒไฟ็ฎกใ—ใฆใ„ใ‚‹ใฏใšใชใฎใซโ€ฆโ€ฆใ€ใ€Œใฏใ†ใ†โ€ฆโ€ฆใ†ใฃใ€ ใ†โ€ฆโ€ฆใ€ใ‚ใพใ‚Šใฎๆ€ฅๅฑ•้–‹ใซๅˆ้Ÿณใฏ้ ญใŒใคใ„ใฆใ„ใ‹ใชใ„ๆง˜ๅญใงใ€ไฝ•ใ‚‚่จ€ใˆใšใ€ใŸใ ใŸใ ่บซไฝ“ใ‚’้œ‡ใ‚ใ›ใฆใ„ใ‚‹ใ€‚ไฟบใฏใใ‚“ใชๅˆ้Ÿณใซใ€ๅ‹ใก่ช‡ใฃใŸ็ฌ‘ใฟใ‚’ๅ‘ใ‘ใŸใ€‚ใ€Œใตใตใ‚“ใ€ๆฎ‹ๅฟตใ ใฃใŸใชใ€‚ใŠๅ‰ใฎ่€ƒใˆใชใ‚“ใ–ใ€ใ™ในใฆใŠ่ฆ‹้€šใ—ใ ใฃใŸใ‚“ใ ใ‚ˆใ€ใ€Œใˆใฃโ€ฆโ€ฆ๏ผŸใ€ๅพ—ๆ„ๆฐ—ใชไฟบใฎๅฐ่ฉžใซ ๆˆ็พŽใŒ้–“ใฎๆŠœใ‘ใŸๅฃฐใ‚’ๆŒŸใ‚“ใงใใ‚‹ใ€‚ใ€Œใงใ‚‚ใ€\Bใใ‚“ใ€‚ๅˆฅใซใใ“ใพใง่ชญใ‚ใฆใŸใ‚ใ‘ใ˜ใ‚ƒโ€ฆโ€ฆใ€\sใ€Œใƒใ‚ซใฃโ€ฆโ€ฆไฝ™่จˆใชใ“ใจ่จ€ใ†ใชใ€‚ใ“ใ†ใ„ใ†ใฎใฏใชใ€็›ธๆ‰‹ใŒ่‡ชๅˆ†ใ‚ˆใ‚ŠไธŠๆ‰‹ใ ใฃใฆใ€ๅฐ‘ใ—ใงใ‚‚ๆ€ใ‚ใ›ใฆใŠใ„ใŸๆ–นใŒใ„ใ„ใ‚“ใ ใ‚ˆใ€ไฝ™่จˆใชใƒ„ใƒƒใ‚ณใƒŸใ‚’ๅ…ฅใ‚Œใ‚‹ๆˆ็พŽใซใ€ๆ…Œใฆใฆๅฐๅฃฐใงๆณจๆ„ ใ‚‹ใ€‚\sใ€Œใ‚ใฃใ€ใใฃใ‹ใ€\sใ€Œใ€Žใ‚ใฃใ€ใใฃใ‹ใ€ใ˜ใ‚ƒใชใ„ใฃใฆใฎโ€ฆโ€ฆใ€ใ€Œใตใตใตใฃใ€ๆฎ‹ๅฟตใ ใฃใŸใญๅคฉๅŸŽใ•ใ‚“ใ€‚ๅฝผใฏใ“ใ†่ฆ‹ใˆใฆใ€ใ™ใฃใ”ใ„ใ‚ญใƒฌ่€…ใชใ‚“ใ ใ‹ใ‚‰โ™ชใ€ๆ”นใ‚ใฆๅˆ้Ÿณใซ่ƒธใ‚’ๅผตใ‚‹ๆˆ็พŽใ€‚ใ›ใฃใ‹ใใฎๆฑบใ‚ใ‚ทใƒผใƒณใŒใ€ๅฐใชใ—ใซใชใฃใŸๆฐ—ใ‚‚ใ™ใ‚‹ใŒโ€ฆโ€ฆใ€Œใ…โ€ฆโ€ฆโ€ฆโ€ฆใ…ใ…ใฃโ€ฆโ€ฆใใ‚“ใชโ€ฆ ใ…ใฃใ€ใ…ใ…ใฃโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆใ€ไปŠใฎๅˆ้Ÿณใซใฏใ€ใใ‚“ใชใ‚„ใ‚Šๅ–ใ‚Šใ‚’ๆฐ—ใซ็•™ใ‚ใ‚‹ไฝ™่ฃ•ใชใฉใ€ใพใฃใŸใใชใ„ใ‚ˆใ†ใ ใ€‚ไธกๆ‰‹ใซ่จผๆ‹ ๅ“ใ‚’ๆŒใฃใŸใพใพใ€ใŸใ ่จˆ็”ปใŒๅคฑๆ•—ใ—ใŸใ“ใจใ‚’ๅ˜†ใใ€ๆ‚”ใ—ๆถ™ใ‚’็›ฎใซๆตฎใ‹ในใชใŒใ‚‰่‚ฉใ‚’ๆบใ‚‰ใ™ใฐใ‹ใ‚Šใ ใฃใŸใ€‚ใ€Œใใ‚Œใซใ—ใฆใ‚‚ใ€ใ‚ทใƒงใƒƒใ‚ฏ๏ฝžโ€ฆโ€ฆใพใ•ใ‹ใ€ๅคฉๅŸŽใ• ใŒๆณฅๆฃ’ใชใ‚“ใ‹ใ™ใ‚‹ใชใ‚“ใฆโ€ฆโ€ฆใ€ใ€Œใ—ใ‹ใ‚‚ใ€็งใฎๆœบใ‹ใ‚‰โ€ฆโ€ฆใ†ใฃใ†ใฃใ€ๅ…ˆ็”Ÿใ€ๅนปๆป…ใ—ใกใ‚ƒใฃใŸใ‚ˆใ€ใ€Œใ†ใ…โ€ฆโ€ฆใชใ€ไฝ•ใ‚’่จ€ใฃใฆใ‚‹ใ‚“ใงใ™ใ‹ใฃโ€ฆโ€ฆ็งใ€ๆณฅๆฃ’ใชใ‚“ใ‹ใ˜ใ‚ƒโ€ฆโ€ฆ๏ผใ€€ใฒใ€ใฒใฉใ„ใงใ™โ€ฆโ€ฆ้ซ˜ๆดฅๅ…ˆ็”Ÿใ‚‚ใ€ใ‚ใ‹ใฃใฆใ‚‹ใใ›ใซโ€ฆโ€ฆใ€ๅผฑใ€…ใ—ใ„ๅฃฐใงๅ่ซ–ใ™ใ‚‹ใ€‚ใพใ ๅฎŒๅ…จใซใฏใ€ๅฟƒใฏๆŠ˜ใ‚Œใฆ ใชใ„ใ‚‰ใ—ใ„ใ€‚ใ€Œใ˜ใ‚ƒใ‚ใ€ใ“ใฎๅ†™็œŸใฏไฝ•ใชใฎ๏ผŸใ€ๆˆ็พŽใฏๆ„ๅœฐๆ‚ชใ็ฌ‘ใ„ใ€ๆบๅธฏใฎใƒ‡ใ‚ฃใ‚นใƒ—ใƒฌใ‚คใซๆ˜ ใฃใŸๅˆ้Ÿณใฎๅงฟใ‚’่ฆ‹ใ›ใณใ‚‰ใ‹ใ™ใ€‚ใ€Œใ‚ณใƒฌใ€ใฉใ†่ฆ‹ใฆใ‚‚ๆณฅๆฃ’ใซใ—ใ‹่ฆ‹ใˆใชใ„ใ‚ˆใญ๏ผŸใ€€ไป–ใฎไบบใŒ่ฆ‹ใŸใ‚‰ใ€ใฉใ†ๆ€ใ†ใ‹ใ—ใ‚‰๏ผŸใ€ใ€Œใกใ€้•ใ„ใพใ™ใฃ๏ผใ€€็งใฏๆณฅๆฃ’ใ˜ใ‚ƒใ‚ใ‚Šใพใ›ใ‚“ใฃโ€ฆโ€ฆ ใ€ใ€Œใงใ‚‚ใ€ใใ‚Œใ‚’ๅˆคๆ–ญใ™ใ‚‹ใฎใฏใ€ไป–ใฎๅ…ˆ็”Ÿๆ–นใ ใ‹ใ‚‰ใโ€ฆโ€ฆใ€ใ€Œใˆใฃโ€ฆโ€ฆ๏ผŸใ€ใ€Œใƒ•ใƒ•โ€ฆโ€ฆ็งใ ใฃใฆ็ฉไพฟใซๆธˆใพใ›ใŸใ„ใ‘ใ‚Œใฉโ€ฆโ€ฆใ‚„ใฃใฑใ‚Šใ€็ซ‹ๅ ดไธŠใ€ใ‚ณใƒฌใฏ่ทๅ“กไผš่ญฐใงๅ ฑๅ‘Šใ™ใ‚‹็พฉๅ‹™ใŒใ‚ใ‚‹ใฎใ‚ˆใญใ‡๏ฝžโ€ฆโ€ฆใ€ใ€Œใƒใƒใƒƒใ€ใ“ใ„ใคใฏ็ต‚ใ‚ใฃใŸใชใ€‚ๅญฆ็”Ÿไผš้•ทใŒๆณฅๆฃ’ใ‹ใ€‚ๅœๅญฆโ€ฆโ€ฆใ„ใ‚„ใ„ใ‚„ใ€้€€ ๅ‡ฆๅˆ†ใ ใฃใฆๅ…ใ‚Œใชใ„ใ ใ‚ใ†ใชใ๏ผŸใ€ใ€Œใใ‚“ใชใฃโ€ฆโ€ฆใŸใ€้€€ๅญฆใฃใฆใฃโ€ฆโ€ฆใฉใ†ใ—ใฆใ€ใใ‚“ใชโ€ฆโ€ฆใฒใ€่ขซๅฎณ่€…ใฏใ€็งใฎๆ–นใชใฎใซโ€ฆโ€ฆใ€ๅˆ้Ÿณใฎ้ก”ใฏใฟใ‚‹ใฟใ‚‹่ก€ใฎๆฐ—ใ‚’ๅคฑใ„ใ€ๅ“€ใ‚Œใชใปใฉใซ้’ใ–ใ‚ใฆใ„ใฃใŸใ€‚ใ€Œใƒ•ใƒ•ใƒƒใ€็›ดๆŽฅใ€ไฟบใ‚’็‹™ใ†ใ‚“ใ˜ใ‚ƒใชใใ€่จผๆ‹ ๅ“ใซ็š„ใ‚’็ตžใฃใŸใฎใฏใ„ใ„็€็œผ็‚น ใฃใŸใžโ€ฆโ€ฆใ‘ใฉใ€่ฉฐใ‚ใŒ็”˜ใ‹ใฃใŸใ‚ˆใ†ใ ใชใ€ใ€Œใพใฃใ€ใใ‚Œใฏใ•ใฆใŠใโ€ฆโ€ฆใ ใ€‚ใŠๅ‰ใ€่ฆšๆ‚Ÿใฏใงใใฆใ„ใ‚‹ใ‚“ใ ใ‚ใ†ใช๏ผŸใ€ใ€Œโ€ฆโ€ฆ่ฆšๆ‚Ÿใฃโ€ฆโ€ฆ๏ผŸใ€ใ€Œๅ‰ใซใ‚‚่จ€ใฃใŸใ‚ใ€‚ไฟบใซ้€†ใ‚‰ใˆใฐใ€ใฉใ†ใ„ใ†ใ‚ณใƒˆใซใชใ‚‹ใ‹โ€ฆโ€ฆใ€ใ€Œใƒƒโ€ฆโ€ฆโ€ฆโ€ฆ๏ผใ€ๅˆ้Ÿณใฏใƒ“ใ‚ฏใƒƒใจ้ก”ใ‚’ไธŠใ’ใฆ็›ฎใ‚’่ฆ‹้–‹ใ„ใŸใ€‚ใ€Œไฟบใฏๆœ‰่จ€ๅฎŸ ใฎ็”ทใชใ‚“ใ ใ€‚็ด„ๆŸ้€šใ‚Šใ€ไฟบใซ้€†ใ‚‰ใฃใŸ็ฝฐใจใ—ใฆใ€ๅ†™็œŸใจๅ‹•็”ปใฏๅ…จ้ƒจใƒใƒฉใพใ‹ใ›ใฆใ‚‚ใ‚‰ใ†ใœ๏ผŸใ€ใ€Œใใฃใ€ใใ‚“ใชใฃโ€ฆโ€ฆ๏ผ๏ผŸใ€ใ€Œใ‚ฏใ‚ฏใ‚ฏใƒƒโ€ฆโ€ฆใ“ใ‚Šใ‚ƒใ‚ใ‚‚ใฎใ™ใ”ใ„ใ‚ณใƒˆใซใชใ‚‹ใžใ€‚ใฉใ†ใ ใ€ๆƒณๅƒใงใใ‚‹ใ‹๏ผŸใ€ใ€Œใชใชใ€ใชใ€ไฝ•ใŒโ€ฆโ€ฆใงใ™ใ‹๏ผŸใ€ใ€Œๆฑบใพใฃใฆใ‚‹ใ ใ‚ใ€‚ๅญฆๅœ’ไธญใฎ็”ทๅญใซ ใŠๅ‰ใฎใ‚จใƒญๅ‹•็”ปใ‚„ใ‚จใƒญๅ†™็œŸใŒใƒใƒฉใพใ‹ใ‚Œใ‚‹ใ‚“ใ ใž๏ผŸใ€ใ€Œๆฌฒๆƒ…ใ—ใŸใใ„ใคใ‚‰ใŒใ€ๆ˜Žๆ—ฅใ‹ใ‚‰ๆฏŽๆ—ฅใ€ใŠๅ‰ใ‚’็Šฏใ—ใซๆฎบๅˆฐใ™ใ‚‹ใ‚“ใ ใ€‚ใ‚ฏใ‚ฏใƒƒใ€ใƒใƒใƒƒโ€ฆโ€ฆใ“ใ„ใคใฏ่ฆ‹็‰ฉใ ใœใฃ๏ผใ€ใ€Œใƒ’ใ‚ฃใƒƒโ€ฆโ€ฆ๏ผ๏ผŸใ€ๅคงใ’ใ•ใจ่จ€ใ†ใ‹ใ€ใ‹ใชใ‚Š่’ๅ”็„ก็จฝใช่„…ใ—ใ ใŒใ€ใŸใ ใงใ•ใˆ็ด ็›ดใชๅˆ้Ÿณใฏใ€่ฟฝใ„่ฉฐ

Sample visual histogram of the file from which I drew my conclusion about its layout:

sm

So your job would be to discover the nature of the structures within this file in detail. Then you'd need to get to know which opcodes mean what so that you can convert binary code into something more high-level. At very least if all you want is to translate the text, you'd need to know what references the text in the latter section, so that when you recompile the script file with your text, the game still knows which string is located where and doesn't crash. To do this, you can use a debugger and set up breakpoints:

  1. OpenFileA to discover where bsxx.dat is opened
  2. ReadFile to discover where its handle is actually used (the game probably reads all the file into memory at once)
  3. Hardware breakpoints on the memory where the file was read (specifically, the text section) to discover what kind of code actually accesses the text
  4. This is where it all begins - now it's time to poke around, investigate the stack and the heap and compare it against hex dump of bsxx.dat, step in and out etc. This is where you get to know what actually triggered the text read, and how you could use this information to recompile the script.

The fourth step is different for each game and is a very time consuming process. You could combine this approach with creating histogram of most commonly used bytes to make educated guesses about which opcode might be used to draw text etc.

rr- commented 8 years ago

As promised, I've implemented BSA and BSC archives as well as BSG image files - this is as far as we can go staying in the realm of arc_unpacker's responsibilities.

thajunk commented 8 years ago

Thank you so much, you've been a great help. Your tips are very useful. And also thanks for the BSA unpacking, you have already gone above what i expected. :)

rr- commented 7 years ago

@thajunk not sure if you're interested in binaries but nonetheless, the builds seem to have been broken for about two weeks. The build 0.10.349 should work fine.