shawngmc / game-extraction-toolbox

Python tools for extracting ROMs from games and investigating files
MIT License
74 stars 7 forks source link

Capcom Arcade Stadium 1 and 2 #18

Open shawngmc opened 2 years ago

shawngmc commented 2 years ago

While I've added placeholders, CAS 1 and 2 aren't yet implemented. Investigation notes go here.

Analysis

KPKA

These appear to be KPKA archives, and it is reasonably certain that the extraction is correct, as most files have a 3-4 character allcaps header indicating file type.

Example KPKA (1556708)

PSB

In the example above, 1556708_1_331.dat appears to have ".psb" at the beginning; however, this doesn't match what we know about the PSB filetype.

TheNametag commented 8 months ago

The depot version says it is supported, but when I attempt to use cas1_old I get an error message that the metadata.json is not found. Nor do I see it in the source files.

RealRelativeEase commented 7 months ago

The depot version says it is supported, but when I attempt to use cas1_old I get an error message that the metadata.json is not found. Nor do I see it in the source files.

I didn't get any of the DLC games, but gextoolbox extracted a folder titled 1943u.zip from the depot, and it plays fine in an emulator.

xperia64 commented 7 months ago

I did some digging into the new format, and the protection on these files is semi-simple. There are two steps, and while neither are great, ironically it's the second one that significantly weakens the first.

Each zip file has been mangled into its own container of sorts; semi-randomly padded with 0xFF and a few seemingly control bytes that seem to either replace/further compress the data in some way (e.g. some repeating bytes may be missing when compared to the older or console releases); every 16 byte chunk of the file is then XOR'd with a fixed value that is unique per file.

While working with these files, I extracted them using QuickBMS which provides nicer file extensions at the moment.

I have not determined how the keys are supposed to be derived, but currently they can be derived directly from at least most files I've looked at:

  1. Locate a large .dat file that corresponds to a mame zip
  2. Look for repeating or mostly repeating 16-byte sequences in the .dat file in a hex editor 2.1. Larger titles appear to have these sequences at or near the top of the file 2.2. The sequence will always be 16-byte aligned 2.3. Some sequences may be trickier to figure out than others; consider 00000002.dat from Savage Bees (Exed Exes); here, the key can be derived from offset 0x2240; note the partially repeated data before and after the key
  3. Take the 16-byte sequence you found, and XOR each byte with 0xFF
  4. For each byte in the file, XOR that byte with the key indexed to the current index % 16
  5. Save the modified data, you should notice some plaintext filenames and pkzip structures with some junk data

Descrambling the extra junk data is where I'm currently stuck. Junk data will frequently begin or end with a non-0xFF byte which presumably tells the descrambler what to do, and I'm not sure what that is.