lifting-bits / mcsema

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
https://www.trailofbits.com/expertise/mcsema
GNU Affero General Public License v3.0
2.65k stars 343 forks source link

mcsema-dyninst-disass cannot handle some avx-512 instructions #672

Open adahsuzixin opened 4 years ago

adahsuzixin commented 4 years ago

Hi team,

I have an independent binary tested_api.zip generated from the assembly source code shown below

    movl    $15, %eax
    kmovb  %eax,%k1
    vpshufb %xmm2, %xmm3, %xmm0{%k1}{z}
    pmovmskb %xmm0, %eax
    ret

But when I used mcsema-dyninst-disass to do the Control Flow Recovery

/home/suzixin/code/remill/scripts/remill-build/tools/mcsema/tools/mcsema_disass/dyninst/mcsema-dyninst-disass --binary tested_api.o --output tested_api.cfg --pie_mode --dump_cfg

The cfg I got was different from the binary.

/home/suzixin/code/remill/scripts/remill-build/tools/mcsema/tools/mcsema_disass/dyninst/mcsema-dyninst-disass --binary tested_api.o --output tested_api.cfg --pie_mode --dump_cfg
name: "tested_api.o"
funcs {
  ea: 0
  blocks {
    ea: 0
    instructions {
      ea: 0
      bytes: "\270\017\000\000\000"
    }
    instructions {
      ea: 5
      bytes: "\305\371\222"
    }
  }
  is_entrypoint: true
  name: "mystrchr"
}
segments {
  ea: 0
  data: "\024\000\000\000\000\000\000\000\001zR\000\001x\020\001\033\014\007\010\220\001\000\000\024\000\000\000\034\000\000\000\000\000\000\000\024\000\000\000\000\000\000\000\000\000\000\000"
  read_only: true
  is_external: false
  name: ".eh_frame"
  is_exported: false
  is_thread_local: false
}
segments {
  ea: 0
  data: "\270\017\000\000\000\305\371\222\310b\362e\211\000\302f\017\327\300\303"
  read_only: false
  is_external: false
  name: ".text"
  is_exported: false
  is_thread_local: false
}
pgoodman commented 4 years ago

Most likely DynInst doesn't support those AVX-512 instructions or mask registers. Neither does Remill, though.

adahsuzixin commented 4 years ago

If we use capstone instead, as PR #638 do(But it seems block in aquynh/capstone#1604), we might solve the problem of disassembler.

BTW, If we solve the disassembler problem, Is it difficult to add support for those AVX-512 instructions or mask registers in Remill?

Aiethel commented 4 years ago

Yup looks like the problem is indeed with those AVX instructions as the function is "cut in the middle". It is no hard obstacle and can be solved, but it would take some unknown amount of time (I already have some ideas). (Also I have never really tested Dyninst on .o instead of fully linked ELF, but it should not play that much role here.)

As for how much work it would be to add to remill, I will leave that answer to @pgoodman

pgoodman commented 4 years ago

Yeah adding AVX512 is likely to be a bunch of work. We could possibly make use of one of those tools that converts the Intel manual to text documents and extracts the code, then programatically translate that to our semantics. There's also some work my Sandeep Dasgupta (@sdasgup3) related to x86-64 semantics, and they might have AVX semantics that we can use for generating things more compatible with remill.

LingjieLi commented 3 years ago

@adahsuzixin hello ,which dyninst version you used, please?