Closed archercreat closed 2 years ago
A solution here is to implement these intrinsics in a compilation unit, then link with your lifted bitcode. E.g. create a dummy implementation in intrinsics.cc
:
#include <cstdint>
struct State;
using Memory = uint64_t;
uint64_t __remill_read_memory_64(Memory *mem, uint64_t addr) {
return 0;
}
Memory *__remill_write_memory_64(Memory *mem, uint64_t addr) {
return mem;
}
uint32_t *__remill_read_memory_32(Memory *mem, uint64_t addr) {
return 0;
}
Memory *__remill_write_memory_32(Memory *mem, uint64_t addr) {
return mem;
}
Memory *__remill_missing_block(State &state, uint64_t addr, Memory *mem) {
return mem;
}
then compile it into intrinsics.bc
bitcode file:
clang intrinsics.cc -emit-llvm -c -o intrinsics.bc
Now you can test your lifted bitcode, e.g.
remill-lift --arch amd64 --bc-out a.bc --bytes 8BFF558BEC8B550C8B4D0853FF7510
llvm-link a.bc intrinsics.bc -o a_intrinsics.bc
clang a_intrinsics.bc -c -o a_intrinsics.o
the binary is now a_intrinsics.o
:
file a_intrinsics.o
a_intrinsics.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
This is what the Remill test runners do, for instance. Another option is that you create an llvm pass that goes and replaces the remill intrinsics with other things, e.g. load
and store
instructions.
Just to close the loop: remill is a library for lifting machine bytes to semantics, but the semantics are intentionally "incomplete." For example, there's no feasible way for us to model all of the stuff that an instruction like cpuid
stuff does. And on a more fundamental level, we didn't want to commit remill to any particular implementation of memory. If we had it as load
s and store
s from the start then it would have interfered with LLVM's ability to optimize, and also made it challenging to use Remill-produced bitcode in specific setups (e.g. VMs).
remill-lift --arch amd64 --bc-out a.bc --bytes 8BFF558BEC8B550C8B4D0853FF7510 llvm-link a.bc intrinsics.bc -o a_intrinsics.bc clang a_intrinsics.bc -c -o a_intrinsics.o
first it doesnt get compiled in clang, gets cstdint not found error, if we change code and even compile, the intrinsics doesdt get inlined in the result object file!
can we get a complete working example?
Hi, I have an issue recompiling llvm ir back to binary.
what I do:
remill-lift-9.0 --arch amd64 --ir_out test.ll --bytes 8BFF558BEC8B550C8B4D0853FF7510
remill-clang-9.0 test.ll