pfalcon / ScratchABit

Easily retargetable and hackable interactive disassembler with IDAPython-compatible plugin API
GNU General Public License v3.0
393 stars 47 forks source link

Enable more Capstone-supported archs #39

Open pfalcon opened 6 years ago

pfalcon commented 6 years ago

With 2.0, Capstone-based ARM support went online, and Capstone supports several more architectures. Fairly speaking, ARM support enablement took a bunch of effort (and isn't really complete), but the cornerstone was supporting 2nd ISA for the code in the same address space. Beyond that, Capstone seems to over pretty weak semantic characterization of instructions, so bunch of that needs to be handled in arch-specific manner in the ScratchABit plugin.

Still, it shouldn't be a rocket science to enable more archs, and this ticket is submitted in the hope to find people who'd be interested to give it a try and share feedback.

References:

maximumspatium commented 6 years ago

Adding a Capstone-based PowerPC 32 plugin shouldn't be a big deal. I'll give it a try...

maximumspatium commented 6 years ago

Enabling Capstone-based PowerPC disassembly was indeed a question of a simple hook. The problem is that it isn't of great avail - recursive disassembly doesn't work due to missing instruction semantics. In the case of PowerPC, it's even worse than anywhere else. The header file include/ppc.h defines only one semantic group PPC_GRP_JUMP:

typedef enum ppc_insn_group {
    PPC_GRP_INVALID = 0, // = CS_GRP_INVALID

    //> Generic groups
    // all jump instructions (conditional+direct+indirect jumps)
    PPC_GRP_JUMP,   // = CS_GRP_JUMP

    //> Architecture-specific groups

There is neither PPC_GRP_CALL, nor PPC_GRP_RET, nor PPC_GRP_JUMP. Just annoying and ridiculous!

Moreover, Capstone's design put all jump instructions into the same category - JUMP - efficiently making itself completely useless for static program analysis. I therefore support your observation about the inconsistent design of Capstone...

maximumspatium commented 6 years ago

Could you explain me how the following code is intended to work?

@staticmethod
    def patch_capstone_groups(inst):
        groups = set(inst.groups)
        if 1: #ARM
            ... 
        if 2: # x86
            ...
        return groups
pfalcon commented 6 years ago

I therefore support your observation about the inconsistent design of Capstone...

Yeah. I don't how to explain it - Capstone seems to be used in many projects, but I guess, mostly as a "flat" disassembler, not for semantic analysis, and/or not in a cross-arch way. Neither I have idea what to do about - I submitted a few tickets to the project, but so far there's no specific feedback from the maintainer/other users.

Fortunately, that's all relatively easily fixable in Python ;-). (Sad that other projects apparently doing the same, or will need to do the same).

pfalcon commented 6 years ago

if 1: #ARM

Sure, that's just hacked-up/unfinished code ;-). Should be fixed in https://github.com/pfalcon/ScratchABit/commit/2eec80e5ded5c46ad89bccff2f6b7f084e5cbca1

maximumspatium commented 6 years ago

Sad that other projects apparently doing the same, or will need to do the same

That's indeed true. I did it in my tools and I know more people doing that, too.

I submitted a few tickets to the project, but so far there's no specific feedback from the maintainer/other users.

I just sent them a ping, see https://github.com/aquynh/capstone/issues/1072

maximumspatium commented 6 years ago

Sure, that's just hacked-up/unfinished code ;-). Should be fixed in 2eec80e

Good, thanks. I'd personally prefer to keep processor-dependent code in processor-dedicated modules instead of putting them all into a singlepatch_capstone_groups. _any_capstone module could provide a basis processor class that will be extended with a processor-specific classification method...

pfalcon commented 6 years ago

Yeah, I guess that can be, and apparently will need to be done - eventually. The current task however would be to avoid code duplication and diverging implementations for different arch's, that's why I put everything into a single file. When support for enough archs will be collected, it can be refactored to be more more "beautiful". So far IMHO, that would be a case of premature perfectalization ;-).