Added Capstone to use as ref for Disassembler impl

emoon commented 8 years ago

As talked about in https://github.com/marhel/r68k/issues/77 this adds Capstone and the m68k part to use as as ref for disassembler. That being said there may likely be some bugs in it as well but I'm happy to fix those we encounter.

There is an example on how to use it in capstone/src/lib.rs and in the test section at the bottom.

emoon commented 8 years ago

I tried to reduce the size as much as I could. I had to leave a bunch of extra headers otherwise some internal structs wouldn't compile and may have ended up being wrong but I was able to remove most of the code.

Also made the code such it's a local crate that r68k depends on. Later on when we don't need it we can just remove it fully and the code also gets compiled which makes it easier to do and test changes in the existing capstone code.

marhel commented 8 years ago

Thanks, that was quick! I hope to have time to look at this a bit this weekend.

I'm also wondering if it wouldn't be rather simple to use the same data structures we need for disassembly to provide assembly as well.

Then we could have a test like:

opcode == Assemble(Disassemble(opcode))

Which at least should be a valid test for all implemented opcodes. What do capstone output in case of invalid opcodes, btw?

emoon commented 8 years ago

The way it works today

        // add.l d1,d2
        // move.b d4,(a0,d0.w)
        // rts
        Capstone::init();  // <- called once for the whole program
        let data = [0xD4, 0x81, 0x11, 0x84, 0x00, 0x00, 0x4E, 0x75];
        let capstone = Capstone::new().unwrap();
        let insn = capstone.disasm(&data, 0x1000, 0).unwrap();
        assert_eq!(insn[0].mnemonic, "add.l");
        assert_eq!(insn[0].op_str, "d1,d2");

        assert_eq!(insn[1].mnemonic, "move.b");
        assert_eq!(insn[1].op_str, "d4,(a0,d0.w)");

        assert_eq!(insn[2].mnemonic, "rts");

If I remember correct Capstone will return dc.w #$<data> for illegal instructions (or instructions being 020+ when in 000 mode)

Capstone do have support for providing more information about each instruction (number of ops, registers, used addressing mode etc) but I'm not sure if it's enough to make it easy to implement an assemble function. Right now only provide the strings back.

marhel commented 8 years ago

Had a quick look just now, and had no issues getting it to compile or run the test! Have you thought about the design of the disassembler? I've just been thinking a bit about adding more data to the op_entry, incrementally (meaning that we don't need to provide the additional disassembly params for all ops, just a few, because setting it up for all ops would take ages to set up just to play with the API and stuff)

As an example, if we look at something like add_8_er_di, we have today;

op_entry!(MASK_OUT_X_Y, OP_ADD_8_ER_DI,   add_8_er_di),

which corresponds to 2^6 (6 zeroes in the opmask) = 64 opcodes, where the varying part is Dx (bits 9-11) and Ay (bits 0-2) and we've hardcoded a few things into the opconstant already;

pub const OP_ADD_8_ER_DI   : u32 = OP_ADD | BYTE_SIZED | DEST_DX | OPER_DI;

byte size (standard size-bits 6-7)
a "direction" in bit 8, in this case "ea to dx"
the addressing mode is displacement.

All this (and the mnemonic) must be known by the disassembler, in order to produce something like "ADD (8, A0), D0" - the mode could be "rediscovered" by the disassembler, if we say this op is a two param op with SRC = Ay+Mode and DST = Dx (where mode is the standard effective address mode (bits 3-5) even though these are actually hard coded here).

So perhaps we end up with something like an enum type "OpKind" or something, with several variabts, and one member is "SizedTwoParamOp(mnemonic, src, dst)";

op_entry!(MASK_OUT_X_Y, OP_ADD_8_ER_DI,   add_8_er_di, SizedTwoParamOp("ADD", AyMode, Dx)),

Perhaps most MASK_OUT_X_Y-ops are very much alike, but I there are some exceptions (such as ADDQ where X is a 1-8 literal), and in any case it's not in general clear if X is SRC and Y is DST, or the other way around. So in the other direction it's similar;

op_entry!(MASK_OUT_X_Y, OP_ADD_8_RE_DI,   add_8_re_di, SizedTwoParamOp("ADD", Dx, AyMode)),

Would something like this suffice as a starting point?

emoon commented 8 years ago

I haven't thought that much about it really.

My idea initially was to "just" port this table https://github.com/marhel/r68k/blob/disassembler/capstone/native/arch/M68K/M68KDisassembler.c#L3406 and pretty much do something very similar to the Capstone implementation.

Also the capstone implementation (compared to the old Musashi one) does more than just output text but also builds info on each instruction with mnemonic, number of ops, addressing mode, registers, etc which is nice info to have in various circumstances. So I would have tried to keep something similar for Rust impl as well.

marhel commented 8 years ago

I agree that disassembly should result in a datatype, not just a string! Regarding the table, it's perhaps less verbose than adding information to the op-entry, as they have a single line for add_re_8, whereas we now have one per addressing mode.

emoon commented 8 years ago

yeah, it's likely possible to compress the table a bit.

marhel / r68k

Added Capstone to use as ref for Disassembler impl #81