Open nickolas-pohilets opened 5 years ago
First of all, thanks a lot @nickolas-pohilets for the PR! I added some comments, let me know if you have any questions or comments to my comments :)
Also I'd like to let you know that there's a lot of work done on the AST part of fcd in the dev-clang-ast
branch of the repo. It's basically a complete rewrite of the AST backend so I personally would not spend too much time on the old AST backend in fcd. The new backend is not at parity with the old one yet though.
Is there a plan to sync back with zneak/fcd? Should project use coding style of the original project or conform with one of trailofbits?
Should project use coding style of the original project or conform with one of trailofbits?
I personally use the trailofbits coding standards in all files with a trailofbits license header. In original fcd code I try to adhere to the original coding style. Generally as a rule of thumb -- make your code blend in with the surrounding code.
Is there a plan to sync back with zneak/fcd?
@pgoodman what do you think? I'm inclined to say no, since the divergence is pretty significant and I think that overall priorities have shifted away from fcd as a tool. The functionality will be covered and expanded upon by other tools (McSema with the DynInst frontend) and a tool based on the new AST backend from fcd.
@nickolas-pohilets what are your use-cases for fcd? We could take them into consideration when developing the other tools.
@nickolas-pohilets There is no plan to do that. The original scope of the project was to port fcd to use Remill because the lifting approaches shared many similarities. Since then, the scope and direction of the project has changed somewhat.
@surovic Returning a container type is usually acceptable because of RVO.
I want to make a decompiler for Objective-C based on fcd+remill. Compiled Objective-C binaries contain lots of meta-information. As a reference, class-dump can generate headers based on that. Decompiling Objective-C into relatively high-quality source code seems to be a pretty low-hanging fruit.
Few things from the top of my head what will be need:
objc_msgSend
. What is the current status of support of LLVM 7.0 in Remill and McSema? I've quickly checked travis scripts in Remill, McSema and cxx-common, and 7.0 version is not mentioned anywhere.
@nickolas-pohilets I think it's a matter of building cxx-common for LLVM 7.0. The latest issues we've been having is actually more to do with RTTI, so that we can support DynInst as a frontend.
So, what is then plan to get these changes merged?
Sorry, closing by mistake.
Just about to check out the PR, see if it builds on my machine and merge if they do. Stay tuned!
Well, unfortunately there's a problem with building against LLVM-4.0
namely due to
a7d6c63 using llvm::Value::deleteValue()
which does not exist prior to LLVM-5.0
.
@nickolas-pohilets do you think you can fix this?
@pgoodman are we going to keep full LLVM-3.5
to LLVM-7.0
compatibility?
Edit: Another incompatibility is in 8e04ec1, since llvm/Transforms/Utils.h
does not exist prior to LLVM-7.0
. Take a look fcd/compat/Scalar.h for a hint at how we resolve header compatibility issues like these.
I propose to drop pre-LLVM 7.0, if possible. As I mentioned in the PR description, there is a bug that got fixed only in LLVM 7.0
I'm going to put this on hold until we hear from @pgoodman
I think I can re-write the code to avoid buggy function to make it work with older versions, but I’m not sure if it is worth the effort. In contrast to Remill, FCD currently does not have any clients to keep compatibility with.
To be honest, I think the fork is currently maintained only because some code might be useful in other projects, like McSema
. FCD doing it's own CFG recovery is more of a hindrance than a feature, since lots of other tools provide this functionality and there's only so much developer time available for FCD. IMO the real value of FCD, considering McSema
covers LLVM IR
generation, is in it's C AST backend.
Agree. Shall be we then go one step further and turn FCD+Remill into FCD+McSema+Some CFG Recovery?
Ideally I want compat with older versions of LLVM. The way I tend to handle that is to add code into https://github.com/trailofbits/remill/tree/master/remill/BC/Compat
auto inst = expression->getAsInstruction();
auto res = ctx.uncachedExpressionFor(*inst);
inst->deleteValue();
return res;
@nickolas-pohilets Can you deal with the leak by attaching inst
to a basic block somewhere, then invoke inst->eraseFromParent()
?
The step has already been more or less made :slightly_smiling_face:. Rellic is the clang-based C AST backend that lived in FCD's dev-clang-ast
branch. It was private for a while, but we figured there's no reason not to make it public at this point.
So the toolchain we are currently looking at is something like McSema+Rellic, where CFG recovery is done by one of McSema's frontends. The main McSema frontend is IDA Pro currently, but there is development being done on a Binary Ninja and the DynInst frontend.
Then, probably, I should already abandon FCD altogether and start hacking on Rellic. Shall we organise a (video) call some time next week to discuss collaboration? Pls pm me.
Sure. What is your username on EH slack?
@Mykola Pokhylets
This MR contains assorted bug fixes, bringing fcd+remill to a point where it is able to successfully decompile a simple function using debug build of LLVM-7.0 and latest version of Remill.
LLVM versions pre 7.0 have a bug where removal of the
dereferencable
attributes triggers an assertion in debug build and reads memory out of bounds in release build, fixed in 9bc0b1080f195636fed019bce979aa72892d6c69.