This PR replaces the capstone backed lifting with remill and does some small refactoring of main.cpp. There's a lot of code copied from McSema, specifically code for lowering memory intrinsics and code for cleaning up lifted IR. Memory read and write intrinsics are lowered into regular LLVM IR load and store instructions to an address space separate from the default one, which is used for "runtime" memory. This should ensure that alias analysis will give proper results.
This approach is different from the one fcd uses, which is tagging LLVM IR load and store isntructions with !fcd.prgmem. I chose this approach to potentially use as much built in LLVM functionality as opposed to needing hooks into analyses, which the original fcd approach needs.
Other than that the lifted IR passes through the LLVM IR verifier pass and also the optimization phase following the lifting seems to work (or at least does not crash). Suprisingly, the same is true about the final C pseudocode generation phase. The phase does not produce very good output yet, but it does not crash, which I think is a rather pleasant surprise.
The following zip contains an example of the original fcd IR output, fcd+remill IR output and fcd+remill output C psudocode. Again, the C pseudocode is in no way representative and is added just for fun.
This PR replaces the capstone backed lifting with remill and does some small refactoring of main.cpp. There's a lot of code copied from McSema, specifically code for lowering memory intrinsics and code for cleaning up lifted IR. Memory read and write intrinsics are lowered into regular LLVM IR
load
andstore
instructions to an address space separate from the default one, which is used for "runtime" memory. This should ensure that alias analysis will give proper results.This approach is different from the one fcd uses, which is tagging LLVM IR
load
andstore
isntructions with!fcd.prgmem
. I chose this approach to potentially use as much built in LLVM functionality as opposed to needing hooks into analyses, which the original fcd approach needs.Other than that the lifted IR passes through the LLVM IR verifier pass and also the optimization phase following the lifting seems to work (or at least does not crash). Suprisingly, the same is true about the final C pseudocode generation phase. The phase does not produce very good output yet, but it does not crash, which I think is a rather pleasant surprise.
The following zip contains an example of the original fcd IR output, fcd+remill IR output and fcd+remill output C psudocode. Again, the C pseudocode is in no way representative and is added just for fun.
remill_liftingt.zip