lifting-bits / fcd

An optimizing decompiler (modified to use remill semantics)
http://zneak.github.io/fcd
Other
30 stars 3 forks source link

Misc fixes #42

Open nickolas-pohilets opened 5 years ago

nickolas-pohilets commented 5 years ago

This MR contains assorted bug fixes, bringing fcd+remill to a point where it is able to successfully decompile a simple function using debug build of LLVM-7.0 and latest version of Remill.

LLVM versions pre 7.0 have a bug where removal of the dereferencable attributes triggers an assertion in debug build and reads memory out of bounds in release build, fixed in 9bc0b1080f195636fed019bce979aa72892d6c69.

CLAassistant commented 5 years ago

CLA assistant check
All committers have signed the CLA.

surovic commented 5 years ago

First of all, thanks a lot @nickolas-pohilets for the PR! I added some comments, let me know if you have any questions or comments to my comments :)

Also I'd like to let you know that there's a lot of work done on the AST part of fcd in the dev-clang-ast branch of the repo. It's basically a complete rewrite of the AST backend so I personally would not spend too much time on the old AST backend in fcd. The new backend is not at parity with the old one yet though.

nickolas-pohilets commented 5 years ago

Is there a plan to sync back with zneak/fcd? Should project use coding style of the original project or conform with one of trailofbits?

surovic commented 5 years ago

Should project use coding style of the original project or conform with one of trailofbits?

I personally use the trailofbits coding standards in all files with a trailofbits license header. In original fcd code I try to adhere to the original coding style. Generally as a rule of thumb -- make your code blend in with the surrounding code.

Is there a plan to sync back with zneak/fcd?

@pgoodman what do you think? I'm inclined to say no, since the divergence is pretty significant and I think that overall priorities have shifted away from fcd as a tool. The functionality will be covered and expanded upon by other tools (McSema with the DynInst frontend) and a tool based on the new AST backend from fcd.

@nickolas-pohilets what are your use-cases for fcd? We could take them into consideration when developing the other tools.

pgoodman commented 5 years ago

@nickolas-pohilets There is no plan to do that. The original scope of the project was to port fcd to use Remill because the lifting approaches shared many similarities. Since then, the scope and direction of the project has changed somewhat.

pgoodman commented 5 years ago

@surovic Returning a container type is usually acceptable because of RVO.

nickolas-pohilets commented 5 years ago

I want to make a decompiler for Objective-C based on fcd+remill. Compiled Objective-C binaries contain lots of meta-information. As a reference, class-dump can generate headers based on that. Decompiling Objective-C into relatively high-quality source code seems to be a pretty low-hanging fruit.

Few things from the top of my head what will be need:

nickolas-pohilets commented 5 years ago

What is the current status of support of LLVM 7.0 in Remill and McSema? I've quickly checked travis scripts in Remill, McSema and cxx-common, and 7.0 version is not mentioned anywhere.

pgoodman commented 5 years ago

@nickolas-pohilets I think it's a matter of building cxx-common for LLVM 7.0. The latest issues we've been having is actually more to do with RTTI, so that we can support DynInst as a frontend.

nickolas-pohilets commented 5 years ago

So, what is then plan to get these changes merged?

nickolas-pohilets commented 5 years ago

Sorry, closing by mistake.

surovic commented 5 years ago

Just about to check out the PR, see if it builds on my machine and merge if they do. Stay tuned!

surovic commented 5 years ago

Well, unfortunately there's a problem with building against LLVM-4.0 namely due to a7d6c63 using llvm::Value::deleteValue() which does not exist prior to LLVM-5.0.

@nickolas-pohilets do you think you can fix this? @pgoodman are we going to keep full LLVM-3.5 to LLVM-7.0 compatibility?

Edit: Another incompatibility is in 8e04ec1, since llvm/Transforms/Utils.h does not exist prior to LLVM-7.0. Take a look fcd/compat/Scalar.h for a hint at how we resolve header compatibility issues like these.

nickolas-pohilets commented 5 years ago

I propose to drop pre-LLVM 7.0, if possible. As I mentioned in the PR description, there is a bug that got fixed only in LLVM 7.0

surovic commented 5 years ago

I'm going to put this on hold until we hear from @pgoodman

nickolas-pohilets commented 5 years ago

I think I can re-write the code to avoid buggy function to make it work with older versions, but I’m not sure if it is worth the effort. In contrast to Remill, FCD currently does not have any clients to keep compatibility with.

surovic commented 5 years ago

To be honest, I think the fork is currently maintained only because some code might be useful in other projects, like McSema. FCD doing it's own CFG recovery is more of a hindrance than a feature, since lots of other tools provide this functionality and there's only so much developer time available for FCD. IMO the real value of FCD, considering McSema covers LLVM IR generation, is in it's C AST backend.

nickolas-pohilets commented 5 years ago

Agree. Shall be we then go one step further and turn FCD+Remill into FCD+McSema+Some CFG Recovery?

pgoodman commented 5 years ago

Ideally I want compat with older versions of LLVM. The way I tend to handle that is to add code into https://github.com/trailofbits/remill/tree/master/remill/BC/Compat

pgoodman commented 5 years ago
      auto inst = expression->getAsInstruction();
      auto res = ctx.uncachedExpressionFor(*inst);
      inst->deleteValue();
      return res;

@nickolas-pohilets Can you deal with the leak by attaching inst to a basic block somewhere, then invoke inst->eraseFromParent()?

surovic commented 5 years ago

The step has already been more or less made :slightly_smiling_face:. Rellic is the clang-based C AST backend that lived in FCD's dev-clang-ast branch. It was private for a while, but we figured there's no reason not to make it public at this point.

So the toolchain we are currently looking at is something like McSema+Rellic, where CFG recovery is done by one of McSema's frontends. The main McSema frontend is IDA Pro currently, but there is development being done on a Binary Ninja and the DynInst frontend.

nickolas-pohilets commented 5 years ago

Then, probably, I should already abandon FCD altogether and start hacking on Rellic. Shall we organise a (video) call some time next week to discuss collaboration? Pls pm me.

pgoodman commented 5 years ago

Sure. What is your username on EH slack?

nickolas-pohilets commented 5 years ago

@Mykola Pokhylets