lifting-bits / fcd

An optimizing decompiler (modified to use remill semantics)
http://zneak.github.io/fcd
Other
30 stars 3 forks source link

Milestone 4: Remill Lifting #7

Closed surovic closed 6 years ago

surovic commented 6 years ago

This PR replaces the capstone backed lifting with remill and does some small refactoring of main.cpp. There's a lot of code copied from McSema, specifically code for lowering memory intrinsics and code for cleaning up lifted IR. Memory read and write intrinsics are lowered into regular LLVM IR load and store instructions to an address space separate from the default one, which is used for "runtime" memory. This should ensure that alias analysis will give proper results.

This approach is different from the one fcd uses, which is tagging LLVM IR load and store isntructions with !fcd.prgmem. I chose this approach to potentially use as much built in LLVM functionality as opposed to needing hooks into analyses, which the original fcd approach needs.

Other than that the lifted IR passes through the LLVM IR verifier pass and also the optimization phase following the lifting seems to work (or at least does not crash). Suprisingly, the same is true about the final C pseudocode generation phase. The phase does not produce very good output yet, but it does not crash, which I think is a rather pleasant surprise.

The following zip contains an example of the original fcd IR output, fcd+remill IR output and fcd+remill output C psudocode. Again, the C pseudocode is in no way representative and is added just for fun.

remill_liftingt.zip

pgoodman commented 6 years ago

You'll also want to handle the atomic read-modify-write memory intrinsics like fetch_and_add.