lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.22k stars 142 forks source link

Provide initial definition for `__remill_sync_hyper_call` #611

Closed tetsuo-cpp closed 1 year ago

tetsuo-cpp commented 1 year ago

This PR is providing an initial implementation of __remill_sync_hyper_call. I'm aiming to handle what I can and, for anything that isn't feasible, calling into a more specific intrinsic like __remill_x86_some_intrinsic_func.

This is to reduce the number of state escapes that happen when lifting challenges.

tetsuo-cpp commented 1 year ago

This is incomplete, I just wanted to ask some questions and check that this approach is looking ok.

The example that we looked at that had a cpuid instruction is able to remove references to State with this change though! 👍

tetsuo-cpp commented 1 year ago

I'm trying to figure out what this is doing: https://github.com/lifting-bits/remill/blob/master/cmake/BCCompiler.cmake#L168

It looks like I'll have to add something to here to compile the runtimes for the architecture they're for. At the moment, the inline assembly for non-x86 architectures doesn't work for this reason.

pgoodman commented 1 year ago

@tetsuo-cpp that looks like a tricky issue to solve, especially with Apple Clang. That used to be a viable target. Over time, I've tried to add a few things to Runtime/, e.g. Float.h, that would try to remove or factor out dependence on system libraries, due to issues in trying to do cross-compilation. I think I took it all a bit further even in my unfinished msp430 branch in this repo. At the end of the day, you might have to fix the target triple (what that string is) to something linux based. I think we have methods with hard-coded platform-specific triples in remill, e.g. Arch::DefaultTriple or something like that. You could try one of those strings here, and then evaluate what breaks. This might be better suited toward a different issue / PR, though.

tetsuo-cpp commented 1 year ago

Ok, my approach has changed significantly since last time. In order to implement some of these hyper calls (cpuid for instance), the only realistic option is to invoke it via inline assembly and then get Clang to give us bitcode for it.

In order to do this, I now have to cross-compile this part of the runtime. Because I'm cross-compiling, I now don't have access to standard library headers which is why there's a lot of code under lib/Arch/Runtime/ to fill in these gaps. Most of this code is a tweaked version of @pgoodman's code in his https://github.com/lifting-bits/remill/tree/msp430 branch. I specifically have chosen to cross-compile just the hyper calls and not the entire runtime to reduce the amount of functionality we have to recreate. For example, the runtimes make use of various STL containers like std::bitset and stuff from <algorithm>. Our build is currently already invoking llvm-link to stitch together each of the bitcode files to make a final runtime bitcode file.

If we agree, I'd prefer to not be exhaustive with the hyper calls right now. If there are more that I could realistically handle, I'd like to follow up with extra PRs for those, since this PR already contains a lot of machinery to even get us to the starting blocks.

tetsuo-cpp commented 1 year ago

Seems like I'll need to bring Limits.h across too judging from that CI failure. Not sure how it's building fine on my machine. Will look into it more tomorrow.

But the general approach will stay the same.

tetsuo-cpp commented 1 year ago

build_linux is failing in master due to #616. I think this is good to go.

tetsuo-cpp commented 1 year ago

Something is wrong with the llvm-link step. The __remill_sync_hyper_call implementation isn't getting picked up for some reason. I'm in the middle of debugging this.

pgoodman commented 1 year ago

I agree about moving some stuff to a future PR. I think that you've rightly found that the pre-existing hypercall mechanism isn't flexible enough, and that where added flexibility is needed, new hyper call intrinsics should be introduced. You've done this, e.g. with setting segment registers, but then they are being called inside __remill_sync_hyper_call, and a future PR should instead just call those new intrinsics in the semantics rather than indirecting through __remill_sync_hyper_call. Realistically, maybe all of the enums should go, and then we'd just have a bunch of different intrinsic calls. Again, separate PR for that.