kotcrab / ghidra-allegrex

Ghidra processor module adding support for the Allegrex CPU (PSP)
Apache License 2.0
91 stars 9 forks source link

Try to recover sections from segment only binaries #29

Open Mc-muffin opened 1 year ago

Mc-muffin commented 1 year ago

A proper issue for the suggestion I filed in issue #28, I'll copy-paste for convenience:

"Would also be cool if we could recover the NID related sections (.lib.ent, .lib.stub, .rodata.sceModuleInfo, .rodata.sceResident and .rodata.sceNid) so we can use the NIDresolver script on these programs too.

Recovering any other section (say .rodata, .ctors, .dtors, .eh_frame etc) would be entirely optional"

Now, I gathered some info that can be useful for this task, so bear with me: The easiest section to recover would be .rodata.sceModuleInfo, because it's address is set in segment's 0 p_paddr field and it always has this structure:

// size is 52 bytes (decimal)
struct PspModuleInfo {
    uint flags;
    char name[28];
    void * gp;
    void * exports;
    void * exp_end;
    void * imports;
    void * imp_end;
};

With this info we can figure where are .lib.ent and .lib.stub too, the field exports points to the start .lib.ent and the field exp_end points to the end of it, similarly, the field imports points to the start of lib.stub and imp_end points to the end of it.

Strictly speaking, both .lib.ent and lib.stub are surrounded by a small 4 byte section that delimits the top and bottom, these marker sections append a .top or .btm at the end of the respective parent sections, like so:

.lib.ent.top  (size: 4bytes)
.lib.ent      (variable size)
.lib.ent.btm  (size: 4bytes)
.lib.stub.top (size: 4bytes)
.lib.stub     (variable size)
.lib.stub.btm (size: 4bytes)

but not sure if the top and btm section recovery is very worth.

Anyway, .lib.ent has the exports info and lib.stub has the imports info, they are basically an array of the following structs:

// for lib.ent - size is 16 (decimal)
struct PspModuleExport {
    char * name;
    uint flags;
    uchar entry_len;
    uchar var_count;
    ushort func_count;
    uint * exports;
};

// for lib.stub - size is 20 (decimal)
struct PspModuleImport {
    char * name;
    uint flags;
    uchar entry_size;
    uchar var_count;
    ushort func_count;
    uint * nids;
    uint * funcs;
};

After those we are only missing .rodata.sceResident and .rodata.sceNid, which would need some parsing to get: For .rodata.sceNid we need to parse .lib.stub, if we sum var_count and func_count for each element of .lib.stub get the size (in ints) of .rodata.sceNid and the start of it it's the lowest value of the field nids

Lastly, .rodata.sceResident, this one has 2 parts:

Anyway, some other sections can probably be figured out too (like .sceStub.text) but I guess these are the more useful ones, sorry for the long issue text :P I must also say that in the case of Danganronpa 2 the segments were kinda in order, but I guess we can't trust that to be case for every game.

Nemoumbra commented 1 year ago

I have just completed manually parsing the binary for Lego Harry Potter (5-7 years). I managed to recover the following sections:

Then I guessed the sections .shstrtab (the e_shstrndx value plus the sh_type was matching) and .symtab (it's the only section with the SYMTAB, the Allegrex plugin agrees with me here), but Ghidra is kind of choking on them. I guess the tool that was used to strip the names of the sections actually ZeroMemory-ed the section and placed a zero at sh_entsize. It doesn't explain though why the address to .symtab is also red. Here's a screenshot: the entsize there is fine. image Ghidra 10.3.1 thinks that the address is not in the program memory. It even threw an uncaught exception at me 3 times.

(SwingExceptionHandler) Error: Uncaught Exception!
NullPointerException - Cannot invoke "ghidra.program.model.address.AddressRange.getMinAddress()" because
"rangeContaining" is null java.lang.NullPointerException: Cannot invoke "ghidra.program.model.address.AddressRange.getMinAddress()" because "rangeContaining" is null

I don't know what caused this, but I'm considering reporting the uncaught exception if it happens on the latest Ghidra too.

The section .sceStub.text kind of doesn't exist as the syscall stubs are present in sections [SECTION7; SECTION26] - 20 sections in total. I suspect that's one section for each module that the game imports (there are 20 entries in .lib.stub). I've verified my hypothesis on a few modules and it seems to be the case here. I personally don't know how to call these ELF sections.

Nemoumbra commented 1 year ago

There's also a funny side-effect of the name stripping... Ghidra shows... (!!) 1491 (!!) strings in the Strings View and all of them are empty! They are also useless as I can't jump to any of them (nOt In PrOgRaM mEmOrY).

image

Now then, regarding the actual issue we're discussing... 1) The Allegrex plugin should probably bundle the required structs together with Elf32_Ehdr, Elf_ProgramHeaderType_Allegrex and the others. I propose the following definitions (these are based on PPSSPP's source code):

struct PspModuleInfo {
    ushort moduleAttrs; /* 0x0000 User Mode, 0x1000 Kernel Mode */
    ushort moduleVersion;
    char name[28]; /* 28 bytes of module name, packed with 0's */
    void * gp; /* ptr to MIPS GOT data  (global offset table) */
    void * libent; /* ptr to .lib.ent section */
    void * libentend; /* ptr to end of .lib.ent section */
    void * libstub; /* ptr to .lib.stub section */
    void * libstubend; /* ptr to end of .lib.stub section */
};

struct PspLibStubEntry {
    char * name;
    ushort version;
    ushort flags;
    byte size; /* The size of this struct in ints */
    byte numVars; /* The number of imported variables */
    ushort numFuncs; /* The number of imported functions */
    void * nidData; /* The pointer to the nids in .rodata.sceNids*/
    void * firstSymAddr; /* The pointer to the first stub function in .sceStub.text */
};

struct PspLibEntEntry {
    char * name; /* May be NULL */
    ushort version;
    ushort flags;
    byte size; /* The size of this struct in ints */
    byte numVars; /* The number of exported variables */
    ushort numFuncs; /* The number of exported functions */
    void * sceResidentPtr; /* The pointer to the nids in .rodata.sceResident */
};

2) The script from https://github.com/pspdev/psp-ghidra-scripts doesn't work right now. I've examined it and I think I can rewrite it (in short, the author calls a method subtract on a scalar thinking it's an address, but, to be honest, the script certainly requires a small refactoring no matter what). I also don't know if the NID database is accurate, does it know all the funcs PPSSPP recognizes and vice-versa?

3) In my opinion, the Allegrex plugin should eventually learn to resolve NIDs itself. I would love to see it even recognizing the exported functions and variables from .lib.ent. If you're interested, have a look at this (navigate to "For PSP") and the source code for PPSSPP. Here's my take on this:

image

Of course, I had to define

struct SceModuleThreadParameter {
    SceUInt32 numParams; /* The number of thread parameters */
    SceUInt32 initPriority; /* The initial priority of the entry thread */
    SceSize stackSize; /* The stack size of the entry thread */
    SceUInt32 attr; /* The attributes of the entry thread */
};

Technically, we can include the typedefs for these types too, but I wouldn't mind seeing just uints there.

Nemoumbra commented 1 year ago

Ok, I've just checked Loco Roco 2. This is how the sections are called there:

image

I guess that solves the problem from before:

I personally don't know how to call these ELF sections.

Mc-muffin commented 1 year ago

Lego Harry Potter is a different case, I think it's good to keep it here but I'm just pointing that out. Lego Harry Potter has sections but they are nameless, this issue was created for games where only segments are present (so you'd have 3 or 4 segments total instead of a bunch of unnamed sections)

I also like the idea of having this plugin resolve NIDs, but one thing at a time :P the resolve nids script is currently broken but there's a PR that fixes it for the time being pspdev/psp-ghidra-scripts#15

Nemoumbra commented 1 year ago

but one thing at a time

I completely agree with you! This is why I believe we should probably start with the games where the sections were not merged, then move to the harder cases like Danganronpa. Thank you for mentioning the PR with the Nid resolver script fixed. I've tried it and everything works fine (other than the struct definitions, I don't like them). I guess I'll make a PR with updates once the current one is accepted.