marhel / r68k

A m68k emulator in rust - let r68k = musashi.clone();
MIT License
71 stars 7 forks source link

Disassembler #77

Open emoon opened 8 years ago

emoon commented 8 years ago

Something that is very useful to have in an emulator core is the ability disassemble instructions for various reasons.

Currently r68k doesn't have one. I have implemented one (in C here) https://github.com/aquynh/capstone/blob/next/arch/M68K/M68KDisassembler.c

This was also based on Musashi but with a fair amount of bugs fixed. Also this version doesn't just do instruction printing but allow you to see which registers, addressing mode, etc is being used for an instruction.

Rewriting this code in Rust is possible for sure but a bunch of work. An alternative would be to rewrite this C code a bit and have a Rust wrapper around it so the user of r68k would only 'see' the Rust part.

Just wanted to hear your thoughts about it.

marhel commented 8 years ago

I think an integrated disassembler/debugger would absolutely be useful!

I haven't thought much about it in the design of r68k, though, and I'm going to focus on getting the cpu part usable first, but I am led to believe that you know a little something about both disassemblers and debuggers so you are welcome to come up with some designs how that might work in/with r68k!

emoon commented 8 years ago

Sounds good :) I will try to think of something.

marhel commented 8 years ago

If we're going to implement a disassembler in rust at some point, it would be a requirement, in my opinion to be able to QC that towards a known good implementation, much like we did the CPU. Can't imagine trying without it, in fact.

emoon commented 8 years ago

Yeah that would be good. Not really exactly sure how to do it though.

marhel commented 8 years ago

Would it be possible to create something like libdissasembler.a based on a working program, set up a memory buffer with some bytes corresponding to some instruction, asking for a disassembly of that buffer and checking that both generate the same output?

emoon commented 8 years ago

Sure. Or actually generate a huge program from the QC tests that we already have here for valid instructions.

emoon commented 8 years ago

I can likely add capstone (slimed down to only use the m68k backend) and add a basic Rust interface for it so it can be called from QC tests. Also Capstone supports several instances which can run in parallel so that can be used to compare with.

marhel commented 8 years ago

Yes, the optable contains useful data for the disassembler! It would be able to find the matching entry for the instruction it was looking at, but there's not enough information how to interpret the "holes" in the mask, such as X and Y, if they represent data or address registers, or something else, and also it doesn't know the addressing mode apart from the hints usually present in the function name. So more information would be needed.

Not having to use semaphores to enforce single threaded access would also be great!

emoon commented 8 years ago

True. I guess it may actually be possible to just try all combos from 1 - 65536. Now there will be a bunch of illegals in there but that would be good to validate that it all works anyway (might be bugs on both)

emoon commented 8 years ago

I can try to get a basic version of Capstone (68k disassembler part) in over the weekend and send a PR.

marhel commented 8 years ago

Ok, I'm thinking we should do that work in a dev-branch for now, I just created the "disassembler"-branch for this.

emoon commented 8 years ago

Sure!

marhel commented 8 years ago

I took a shot an an initial implementation yesterday, and got something I was not entirely unhappy with, by adding a disassembly module alongside the cpu module, but was really bugged by the fact that any trivial change there resulted in a minutes wait to recompile 12K lines of unrelated stuff (which after macro expansion seems to be more like 50K lines). I guess this is the non-incremental compilation showing its ugly head.

It made me want to rip out a few constants and other stuff to depend on, and work in an unrelated project, but I hope there's some better way.

You seem to have a much better grasp of cargo and crates than I have, so I wondered if there was some smarter way do divide stuff into crates or submodules in a way that would allow us to work on the disassembler, and let it use constants/enums/structs/traits that we've already defined without needing to recompile everything every time.

marhel commented 8 years ago

Also, I could push my WIP to the disassembler branch if you want to have a peek.

emoon commented 8 years ago

Sure!

marhel commented 8 years ago

Pushed now. I made a few constants and other stuff public in the old stuff, in order to be able to reuse it here. Also, I reused the LoggingMem to read ops out of "memory", but I guess that interface is not really useful if you are not disassembling a current r68k session with in-memory code.

Feel free to change any and all things as well, this was just to get this part going somewhere :)

emoon commented 8 years ago

what you could do is to split it up it to three separate crates

  1. Shared (constants/enums/etc)
  2. Emulator
  3. Disassembler

Now it would be possible to work "inside" the Disassembler crate only running cargo test this crate would still be a lib so there wouldn't be a main function to run stuff in.

In that case it's possible to add things under the example directory inside the Disassembler create and run them as cargo run --example some_example if you would like a real main one can do a create outside called disassembler_test or something that only depends on the Disassembler crate.

marhel commented 8 years ago

Useful command to run/visualize the test I did write; cargo test -- disassembler --nocapture

marhel commented 8 years ago

Yeah, I saw the --example param somewhere a while ago and immediately thought it would be a good match for r68k! It really should be a library crate, I guess.

emoon commented 8 years ago

Before release it should be a library for sure (that is the way people would use it anyway)

emoon commented 8 years ago

Also I'm not sure if you have push the disassembler.rs file

marhel commented 8 years ago

Oops, you are right. I'll be pushed shortly!

marhel commented 8 years ago

Now I pushed myself to push the missing file... ;)

emoon commented 8 years ago

👍

marhel commented 8 years ago

Also got some time to update the disassembler/assembler to a state where I'm happy with the design. If you're interested, take a look at either the disassembler branch, or the new library branch. I've yet to actually use capstone, but it was quite fun to get the disassembler/assembler working in concert (anything that can be disassembled should also assemble back to the starting opcode).

The disassembler/assembler just knows a subset of the ADD opcodes at this point. Adding more of the same kind of instructions (formats) with already implemented encodings should be trivial. Other instruction formats will need new decode/encode fn support.

The assembler is quite primitive, and extremely picky about syntax at the moment - it will basically only accept exactly the syntax that the disassembler generates. The parser is also completely regex based, which is probably not that efficient (saw extreme speedups when I started compiling complex regexes once, instead of once per opcode)

emoon commented 8 years ago

Cool. I would suggest looking at https://github.com/Geal/nom for parsing in the assembler.

marhel commented 7 years ago

Updated the assembler parser based on pest and it now parses the 10K lines of this basic interpreter in 68k assembly successfully(*), which is a big step forward, as the old regex-based parser was very limited, but the new parser accepts actual code. Note that while the parser is now good, the assembler itself still just supports a handful of opcodes.

I looked at nom, but found pest to be much more approachable.

*) Well, almost anyway. I decided to only support semicolon comments at the moment, so I edited the file slightly first, and it doesn't recognize the register lists of the movem instruction, nor the IF-statements (conditional assembly) yet. Movem needs to be supported when I get around to actually implement movem support in the disassembler/assembler - but conditional assembly is not a big priority at this point.