riidefi / mkw

Decompilation of Mario Kart Wii
295 stars 30 forks source link

Translation unit detection #65

Open wait-wtf opened 3 years ago

wait-wtf commented 3 years ago

Most of the unresearched code currently sits in a handful of large assembly blobs. These blobs contain lots of unrelated pieces of code. We need to improve structuring.

A basic improvement is to recover the original translation unit slices and generate C inline ASM files for each TU.

The CodeWarrior build system leaks some information on TU structure. Examples:

riidefi commented 3 years ago

Some more clues:

riptl commented 2 years ago

Resuming work on this. To begin with, I'm going to export all symbols, XREFs, etc, from @stblr's Ghidra using https://github.com/r0metheus/GhiDump This should get us off the ground with the sdata2 float dedup heuristic.

riptl commented 2 years ago

First attempt at translation unit detection using the sdata2 heuristic has been successful (well, kinda?).

File format is

<SDATA2_START>..<SDATA2_STOP> <TEXT_START>..<TEXT_STOP>

Please note that the detected text TUs only set the minimum span. They are always greater in practice.

sdata_detect_attempt.txt

riidefi commented 2 years ago

Nice work! I think for the time being, we can fairly easily do .text splits using the symbol map. If the script could then autogenerate the data splits, that would be really convenient.