maurer / tiamat

Modules for doing binary analysis with holmes
MIT License
1 stars 0 forks source link

don't call the recursive descent disassembler to get just one instruction #6

Open ivg opened 7 years ago

ivg commented 7 years ago

It looks like that you're calling a disasm function with one root, and taking only the first instruction. However, depending on the place of the root, you may actually get the whole binary disassembled, and you will take only first instruction). Algorithmically, it makes your disassembler to be O(N^2). With a huge constant factor.

What you really need is a function of type

val disasm_insn : arch -> mem -> (mem * insn) Or_error.t

There is no such function in Bap.Std however it can be relatively easy implemented using Disasm_expert interface (that is currently not exposed in bindings, and most likely will not be exposed - as the interface is a thin wrapper on top of the C interface, that only adds strict type discipline). Here is a prototype (untested):


module Dis = Disasm.Disasm_expert.Basic

let disasm arch mem = 
  let module Target = (val target_of_arch arch) in
  let lift mem insn = match Target.lift mem insn with
    | Ok bil -> Ok (mem, Insn.of_basic ~bil insn)
    | err -> err in
  let target = Arch.to_string arch in
  Dis.with_disasm ~backend:"llvm"  target ~f:(fun dis -> 
     match Dis.insn_of_mem mem with
     | Ok ((mem,Some insn),_) -> Ok (mem,insn)
     | Ok ((_,None),_) -> Error (Error.createf "no instruction...")
     | Error err -> Error err)

This approach has a slight overhead, as we need to create a disassembler every time, and lookup for a target, etc. Given that insn disassembly is a very tight loop, then the following interface would be nicer:

module Disasm : sig
  type t
  val create : arch -> t Or_error.t
  val insn : mem -> (mem * insn) Or_error.t
  val close : t -> unit
end

that will map to

typedef struct bap_basic_disasm_t *bap_basic_disasm_t
bap_basic_disasm_t *bap_basic_disasm_create(bap_arch_t arch);
bap_code_t *bap_basic_disasm_insn(bap_basic_disasm_t *dis, bap_memory_t *mem);
void bap_basic_disasm_close(bap_basic_disasm_t *dis);

However, this interface would require some changes from your side (you need to pass the disassembler handler.

So it's your call which interface you prefer, I can push it quite fast to the bindings.

maurer commented 7 years ago

The impact of the current strategy is limited by the 16-byte size of the memory buffer being passed in. That being said, using the second interface seems nice assuming that a single bap_basic_disasm_t object can be used arbitrarily (my intent would be to hide this inside bap-rust and have it just create it once and keep reusing it).