uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.09k stars 250 forks source link

Adding PalmOS 68k Support #1301

Closed pinchies closed 7 months ago

pinchies commented 7 months ago

As discussed on discord, I would be interested in learning what would be required to add full support for decompiling PalmOS .PRC executables, which are "mostly" 68K, and with a similar kind of system call Trap API as on MacOS.

uxmal commented 7 months ago

Commit 3418ed3b7ddad3ef045286cba93d10d14a8db94e contains the promised scaffolding. Reko has new PalmOSPlatform and a PrcLoader classes that load files with the .prc extension. I wasn't able to tell if there is a magic number in .prc files, so for now Reko will rely on file extensions to identify these files. Ideas for more robustness welcome.

Next challenges are: build up the data segment and load the code#0 resource correctly . The documentation given specifies that this resource sometimes is generated similar to the corresponding resource on MacOS classic. The code from Reko's MacOS can be adapted to load the function pointers into memory. Inter-segment calls will need to be resolved by implementing a ResolveIndirectCall(RtlCall instr) method in PalmOSPlatform, very similar to its counterpart in MacOS/MacOSClassic.cs.

I noticed the following fragment in code#1:

00100018 486E FFF4 pea -$000C(a6)
0010001C 486E FFF8 pea -$0008(a6)
00100020 486E FFFC pea -$0004(a6)
00100024 4E4F trap #$0F
00100026 A08F illegal #$A08F
00100028 3800 move.w d0,d4

There is an A-line illegal instruction at address 00100026 preceded by a trap #$0F. To make progress we need to know:

Finally, it would be nice to know what the default calling convention used by C programs on PalmOS is.

Let me know if you want to sink your teeth into these research questions. Right now, Reko can just open .prc files but probably can't retrieve much useful info yet.

uxmal commented 7 months ago

I can answer the first query: on PalmOS, you're expected to generate a trap #0F for all system calls. The 16-bit following the trap is not actually executed, but is used as a in-line argument, indicating which service to use. I will see if I can make Reko recognize this later this week.

uxmal commented 7 months ago

Simple PalmOS support is there. I'm closing this as completed, with the understanding that more features can be added.

pinchies commented 7 months ago

Incredibly impressed with what you were able to integrate in such a short amount of time. I'm going to start to first outline things that could be added and fixed in this comment, and then we can chat about it and go from there. Edits incoming.

1) Debug symbols - function names. Often after a RTS (4E75) instruction, there may be a string directly afterwards, which corresponds to the name of the function. The format is 1 byte which is equal to 128 + length of the function name, followed by the function name in ASCII, followed by 2 zero bytes if length of function = odd, or 3 zero bytes if length of function = even. Having function names properly inserted would go a long way towards making highly readable code translation.

2) Missing procedure entry points. A significant number of functions are being missed in the code. Always starting with LINK (4E 56), (and always with a 00 preceding them too) they are easy to pickup manually, but this is time consuming.

uxmal commented 7 months ago
  1. The Debug symbols look similar or maybe even identical with how Macsbug symbols are implemented on MacOS Classic. It might make sense to port that over to PalmOS. Look for MacsBugSymbolScanner.cs in the MacOS classic folder. If the symbols are indeed formatted the same, it might make sense to actually use the same class. The problem is that I don't want to add a dependency on MacOS in the PalmOS project, or vice versa, and these bits don't really belong in the M68k project. Perhaps I need to add a new src/Libraries/MacsBug/MacsBug.csproj to host this common code.
  2. To find missing procedures, you can use Reko's shingle scanner. Once you load the program image into the Reko Gui, but before you've started the decompilation process, you can ask Reko to perform "Shingle scanning", which looks inside the "gaps" to see if it can make sense of them as code. To turn on this optional feature, you can select the executable in the Project Browser, Go to the Edit > Properties menu and bring up the property pages for the executable: image If you then start the decompilation, you will collect more procedures.
pinchies commented 7 months ago

Super, thanks for those details. Leave it with me and I’ll have a play! 🙌

999pingGG commented 3 months ago

How can I make it decompile all the "code" resources/segments? If needed, I can modify the code but I don't know where to look

image

uxmal commented 2 months ago

@999pingGG : Reko will by default search for executable code using a recursive algorithm. Starting at known entrypoints in the binary, it disassemble the instructions in a linear fashion until it encounters jump, branch, and call instructions in the program. Direct jumps, like bra $1234 are followed, but indirect jumps like jump $(a1) are not unless Reko can determine what value is in register a1. If the only way to reach the code in code#2 etc is via an indirect jump, Reko will in general not be able to find it.

There are two workarounds. In the Memory View you can select a memory address and then run the Mark procedure entry command: image By doing this you're telling Reko that you know that the marked address is the start of a procedure. Reko will then use its recursive algorithm starting at the marked address.

The other approach is to use a heuristic approach known as "Shingle scanning". To use it, right after opening the binary in Reko, select the program in the Project browser (on the left) and select the menu command File > Properties... image

When the property sheet shows up, select the Scanning tab and check the Shingle heuristic option. image Finally, continue decompiling as before. This heuristic explores the "gaps" between functions more aggressively and can often cause false positives (i.e. misidentifying data or other non-code bytes as part of the program).

By judiciously using the Mark procedure entry approach and shingle scanning, you will be able to retrieve all of the executable code. Make sure you save your project after this (use the File >Save menu item) to not lose work.

@999pingGG: if you have any questions feel free to pose them in a separate, new issue. I rarely look at closed issues, and only encountered your comment here by chance!