pret / pokecrystal

Disassembly of Pokémon Crystal
https://pret.github.io/pokecrystal/
2.1k stars 797 forks source link

Reverse-engineer the Crystal "base" data inserted by tools/stadium.c #752

Closed Rangi42 closed 2 years ago

Rangi42 commented 4 years ago

This is 24 bytes of data starting with ASCII "base" that come right before the Stadium 2 checksums in Crystal. It's not present in G/S.

One version of this data is in the US 1.0, US 1.1, and AU ROMs; another is in the EU ROMs; another is in the US 1,0 and 1.1 debug ROMs. Some of the early Crystal builds have still other versions, or no such data.

https://github.com/pret/pokecrystal/blob/master/tools/stadium.c#L27-L37

The original xtal, xtal_revise, and xtal_AUSTRALIA pmcenv each have a crystal_base0.bin file; and xtal_euro pmcenv has a crystal_base1.bin file. These are particular ROM builds without the base or Stadium data at the end.

It may not be possible to reverse-engineer, in which case tools/stadium.c will continue to insert it from hard-coded values.

Rangi42 commented 4 years ago

Here's a CSV of all the .bin and .gb build files that have this base data, as well as the three crystal_base0.bin files that have all zeros there: https://pastebin.com/UdCn9fDz

Rangi42 commented 4 years ago

All 21 unique base data sequences (each beginning with 62 61 73 65, ASCII "base"):

Rangi42 commented 4 years ago

After the "base", every sequence has 01, and then 00 except for two unique sequences: the final Crystal 1.1 sequence, and the sequence 62 61 73 65 01 01 77 c4 00 10 00 00 00 0c a3 3c 10 ff ff 07 00 00 00 14 used by three builds: xtal_euro/pmcenv/CRYSTAL_ps3_spa_010724d.bin, xtal_euro/pmcenv/CRYSTAL_ps3_spa_010725d.bin, and xtal_euro/pmcenv/CRYSTAL_ps3_spa_010726d.bin.

All of them have ff ff as the 18th and 19th bytes.

Rangi42 commented 4 years ago

This is in Stadium 2 (https://github.com/pret/pokestadium/blob/master/stadiumgs/main.s):

    .db POKEMON_CRYSTAL
    .db "PM_CRYSTAL",0,"base"
    .db NON_JAPAN
    .db 0
    .dh 0x5e1c
    .dw 0x000c0000
    .dw CrystalUSAChecksums - Base0x23A5000
    .dh 0x000b
    .align 4

However, other ROMs have their manufacturer codes where "base" is (e.g. AAUE for POKEMON_GLD US/EU).

Also note that the data at the end of 0e.bin:

CrystalUSAChecksums:    ; Crystal USA
.incbin "gameboy/0e.bin"

is similar but not identical to the checksum data for US Crystal.

mid-kid commented 4 years ago

I've personally written this off as unsolveable, at least merely using officially released data.

I should point out that this data only exists in US/AU and EU Crystal, but it doesn't exist in any language of Gold and Silver, nor japanese Crystal! Furthermore, only the japanese version of crystal, and the "base" version of US crystal were included in Stadium 2. So, whatever this is, it was added at a rather late stage of development (i.e. translation), and might or might not have something to do with stadium.

Since this issue is blatantly referring to the leaks, and this, unlike the stadium checksums, probably can't be solved without taking a peek at the plethora of builds in there (I should point out that it seemingly doesn't contain any tools that generate this data), I'll point out that the stadium 2 header refers to the rom located at xtal/pmcenv/crystal_base0.bin, which contains "base" in the header where the usual "BXTE"/"BYTE" would go. There's a different xtal_euro/pmcenv/crystal_base1.bin for the european builds. The relationship between these roms and the base data at the end of the release roms is unclear, but it should be noted how this ROM appears in xtal_revise/pmcenv/crystal_base0.bin as well, completely untouched, in an otherwise clean build directory, indicating that it might've been necessary during some build stage.

Rangi42 commented 4 years ago

Yeah, I was searching for any files that could even resemble a base-data-generating tool, but only turned up the builds containing that data, so went ahead and documented what we do know about it.

Rangi42 commented 3 years ago

The data format is: ASCII "base", then 01, then a version byte 00 or 01, then a two-byte CRC, then 16 more bytes. The CRC is taken across all 24 bytes, using placeholder 00 00 for the two bytes that will be the CRC.

This can be found with RevEng, searching across all 21 ROMs' base data:

$ ./reveng -w 16 -s 6261736501000000401000aa40bbe73800ffff2707396a069f04 626173650100000040100022003ae33800ffff07071068068207 62617365010000004010a0ba40bbe73800ffff2707f97a064e08 62617365010000004010a0ba40bbe73800ffff2707f97b0ef520 6261736501000000c01101aa413af73e18ffff0f0719e8073121 6261736501000000c4d101aa41bbf73e18ffff6f0739ea07d428 6261736501000000c05100aa41baf73c18ffff4f07396807473f 6261736501000000c4d101aa41bbf73e18ffff6f0739ea27f679 626173650100000040100022403ae33800ffff07071068060d7c 6261736501000000401000aa403ae73800ffff0707196806b083 6261736501000000c05100aa41bbf73c18ffff6f07396a074c89 62617365010000004010c036c0bfe73800ffff27bfff7b5e4d95 6261736501000000c4d101aa41baf73e18ffff4f0739e807df9e 62617365010000004010e0fec0bbe73800ffff27bfff7b5e4ca5 6261736501000000c4d101aa413af73e18ffff4f0719e807fbaf 6261736501000000401000aa40bae73800ffff070739680694b2 626173650100000040110022003af33818ffff0f071068076bbf 6261736501000000f4ffc736dbffff7ffcffffefbfffff7f0ed7 62617365010000004010a0ba40bbe73800ffff2707b96a06b5f6 626173650101000000100000000ca33800ffff0700000014cf1e 626173650101000000100000000ca33c10ffff0700000014c477
width=16  poly=0xe1c3  init=0x7b35  refin=true  refout=true  xorout=0x0000  check=0xc0f6  residue=0x0000  name=(none)

(The checksum bytes are appended to the end of each hex string, swapped for endianness.)

This is the same CRC_POLY value as the Stadium checksums, 0xC387; but CRC_INIT is 0xACDE instead of 0xFEFE.

Rangi42 commented 3 years ago

The remaining 16 bytes of base data could be further parity/checksum bits for the 128 ROM banks. (Although note that the US 1.0, US 1.1, and AU ROMs have identical values despite different contents.)

The bytes 40 11 00 22 00 3A F3 38 18 FF FF 0F 07 10 68 07 correspond to bits 00000010 10001000 00000000 01000100 00000000 01011100 11001111 00011100 00011000 11111111 11111111 11110000 11100000 00001000 00010110 11100000 (with each byte reversed so bit 0 is first and 7 is last). In this order, the long run of twenty 1s corresponds to banks $48-$5B, which contain the "Pics 1"-"Pics 20" sections. The empty banks $75, $76, $79, and $7A also all have 1s.

Banks with 1s in US 1.0 and their sections:

mid-kid commented 3 years ago

First thing that popped into my mind when I saw your comment was comparing the banks, partially because I've heard stadium loads banks selectively (which is why crystal load times are slower or smth), and because those are all banks that have no reason to change between localization revisions, so here's a little script:

#!/usr/bin/env python3

base = open("crystal_base0.bin", "rb").read()
rel = open("BYTE00-0.gb", "rb").read()

for x in range(0x80):
    for y in range(0x4000):
        off = x * 0x4000 + y
        if base[off] != rel[off]:
            break
    else:
        print("%02x" % x, end=" ")
print()

And the most important bit, the output:

06 08 0c 19 1d 29 2b 2c 2d 30 31 34 35 36 37 3b 3c 3d 43 44 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 60 61 62 6c 73 75 76 78 79 7a

See any similarities with your list?

Rangi42 commented 3 years ago

mid-kid, hi! Brilliant! I didn't know that about Stadium. Unfortunately crystal_base0.bin isn't included in pokestadiumgs.n64, so we can't use it to derive those bytes. Still, it's great to know what the data means, and maybe once Stadium is disassembled further it will turn out to be possible.

(Edit: Stadium has some of Crystal's banks, in compressed form, as well as checksums for both Crystal Japan rev 0 and Crystal USA.)

Rangi42 commented 3 years ago

Bank $66 of crystal_base0.bin has the same two half-bank checksums as pokecrystal.gbc, but does not have identical content and does not get a 1 bit. So stadium.c can't just store the base checksums and use them to derive the 128 bank bits.

mid-kid commented 2 years ago

Reopening since this was never solved - tools/stadium.c still doesn't generate correct stadium data due to the lack of generation for this bitfield. There's ways to work around the checksum conflicts, for example by using stronger checksums for the bitfield or hardcoding the "initial" bitfield and only setting extra 1 bits but never unsetting them.

Rangi42 commented 2 years ago

I think this can remain closed. We solved and documented what the bits actually mean. Generating them from the built ROM instead of hard-coding them would add extra complexity and would still require even more hard-coded data (the expected checksums of crystal_base0.bin, using multiple algorithms or a new algorithm without coincidental collisions for any of the ROMs we support building). And as far as I know the checksums don't affect Stadium 2 compatibility, so there's no value in generating accurate ones for a customized ROM.

mid-kid commented 2 years ago

Since we still don't have any conclusive answer as to how this data affects stadium, I'd prefer generating the data correctly. If anything, it helps codify the research in this issue, and explain the data better. Since there's no way to recover some of the necessary crystal_base0.bin banks in any legal manner I think I prefer hardcoding the initial bitmap, it wouldn't be incredibly complex anyway.

Keeping this issue open as a personal reminder. I'll probably do it myself at some point.

Rangi42 commented 2 years ago

Instead of using the half-bank checksums, we can checksum each whole bank (using the 0xFEFE init value), and compare with a hard-coded array of the base ROM's 128 bank checksums. That avoids any collisions, at least for the ROMs we generate so far (US, AU, debug).

Verification script: https://pastebin.com/xmcJ2GgL (Let's make sure it works for the EU ROMs too though.)

(The .c file can also comment that the base checksums came from crystal_base0.bin; we already mention the original debug ROMs' names in the readme.)

Rangi42 commented 2 years ago

https://pastebin.com/yrweqJw3 (This always prepends "base", 1, 0, not "base, 1, 1 as the EU versions will need. Hopefully that can be conveniently checked for somehow, at worst we can pass a --eu or -1 flag.)

Also I want to refactor this to not depend so much on the runtime ROM filesize. The stadium data is always going to assume 128 banks, so we just need to avoid reading file data past that.

Rangi42 commented 2 years ago

Apparently there's been an error in the so-far-unused EU base data (needed for ES, DE, FR, and IT i18n). It should have been:

There's a problem with the EU base data (needed for ES, DE, FR, and IT i18n). The actual ROMs, and the current hard-coded stadium.c bytes, have: 0x00, 0x10, 0x00, 0x00, 0x00, 0x0C, 0xA3, 0x38, 0x00, 0xFF, 0xFF, 0x07, 0x00, 0x00, 0x00, 0x14 But the calculated values are: 0x00, 0x10, 0x00, 0x00, 0x00, 0x08, 0xA3, 0x38, 0x00, 0xFF, 0xFF, 0x07, 0x00, 0x00, 0x00, 0x04 Two banks are different from the base ones that we'd expect to be identical.

Could they have been patched after the base data was generated, before release? Edit: Never mind, they use crystal_base1.bin.