lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.27k stars 143 forks source link

error while recompiling llvm to binary #430

Closed archercreat closed 2 years ago

archercreat commented 4 years ago

Hi, I have an issue recompiling llvm ir back to binary.

lifted_code:(.text+0x46): undefined reference to `__remill_write_memory_64'
lifted_code:(.text+0x7d): undefined reference to `__remill_read_memory_32'
lifted_code:(.text+0xa0): undefined reference to `__remill_read_memory_32'
lifted_code:(.text+0xca): undefined reference to `__remill_write_memory_64'
lifted_code:(.text+0xf8): undefined reference to `__remill_read_memory_64'
lifted_code:(.text+0x113): undefined reference to `__remill_write_memory_64'
lifted_code:(.text+0x136): undefined reference to `__remill_missing_block'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

what I do: remill-lift-9.0 --arch amd64 --ir_out test.ll --bytes 8BFF558BEC8B550C8B4D0853FF7510

remill-clang-9.0 test.ll

tathanhdinh commented 2 years ago

A solution here is to implement these intrinsics in a compilation unit, then link with your lifted bitcode. E.g. create a dummy implementation in intrinsics.cc:

#include <cstdint>

struct State;
using Memory = uint64_t; 

uint64_t __remill_read_memory_64(Memory *mem, uint64_t addr) {
    return 0;
}

Memory *__remill_write_memory_64(Memory *mem, uint64_t addr) {
    return mem;
}

uint32_t *__remill_read_memory_32(Memory *mem, uint64_t addr) {
    return 0;
}

Memory *__remill_write_memory_32(Memory *mem, uint64_t addr) {
    return mem;
}

Memory *__remill_missing_block(State &state, uint64_t addr, Memory *mem) {
    return mem;
}

then compile it into intrinsics.bc bitcode file:

clang intrinsics.cc -emit-llvm -c -o intrinsics.bc

Now you can test your lifted bitcode, e.g.

remill-lift --arch amd64 --bc-out a.bc --bytes 8BFF558BEC8B550C8B4D0853FF7510
llvm-link a.bc intrinsics.bc -o a_intrinsics.bc
clang a_intrinsics.bc -c -o a_intrinsics.o

the binary is now a_intrinsics.o:

file a_intrinsics.o
a_intrinsics.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
pgoodman commented 2 years ago

This is what the Remill test runners do, for instance. Another option is that you create an llvm pass that goes and replaces the remill intrinsics with other things, e.g. load and store instructions.

pgoodman commented 2 years ago

Just to close the loop: remill is a library for lifting machine bytes to semantics, but the semantics are intentionally "incomplete." For example, there's no feasible way for us to model all of the stuff that an instruction like cpuid stuff does. And on a more fundamental level, we didn't want to commit remill to any particular implementation of memory. If we had it as loads and stores from the start then it would have interfered with LLVM's ability to optimize, and also made it challenging to use Remill-produced bitcode in specific setups (e.g. VMs).

0x410c commented 1 year ago
remill-lift --arch amd64 --bc-out a.bc --bytes 8BFF558BEC8B550C8B4D0853FF7510
llvm-link a.bc intrinsics.bc -o a_intrinsics.bc
clang a_intrinsics.bc -c -o a_intrinsics.o

first it doesnt get compiled in clang, gets cstdint not found error, if we change code and even compile, the intrinsics doesdt get inlined in the result object file!

can we get a complete working example?