lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.29k stars 145 forks source link

Segmented memory access #333

Open tathanhdinh opened 5 years ago

tathanhdinh commented 5 years ago

Given instructions:

36 8b 18    mov ebx, ss:[eax]

and

2e 8b 18    mov ebx, cs:[eax]

remill lifts to the same function:

define dso_local %struct.Memory* @sub_0(%struct.State* noalias dereferenceable(3376), i32, %struct.Memory* noalias) {
  %4 = getelementptr inbounds %struct.State, %struct.State* %0, i32 0, i32 6, i32 33, i32 0, i32 0
  %5 = getelementptr inbounds %struct.State, %struct.State* %0, i32 0, i32 6, i32 1, i32 0, i32 0
  %6 = getelementptr inbounds %struct.State, %struct.State* %0, i32 0, i32 6, i32 3, i32 0, i32 0
  %7 = load i32, i32* %5, align 4
  %8 = add i32 %1, 3
  store i32 %8, i32* %4, align 4
  %9 = tail call i32 @__remill_read_memory_32(%struct.Memory* %2, i32 %7)
  store i32 %9, i32* %6, align 4
  %10 = tail call %struct.Memory* @__remill_missing_block(%struct.State* nonnull %0, i32 %8, %struct.Memory* %2)
  ret %struct.Memory* %10
}

so the segmentation (memory model) is ignored. But it's only true in Intel's 64-bit mode, in other mode (e.g. compatibility mode) the memory model has segments.

IMHO, there are several parts which can be modified to support segmented memory model. For example, the initialization of segment registers: https://github.com/trailofbits/remill/blob/9136eb565e4c3862093959b22da74ebc75e815a8/remill/Arch/X86/Runtime/BasicBlock.cpp#L162-L167

and the calculation of segmented address: https://github.com/trailofbits/remill/blob/9136eb565e4c3862093959b22da74ebc75e815a8/remill/BC/Lifter.cpp#L586-L589

and maybe others (?).

Many thanks for any response.

pgoodman commented 5 years ago

Should probably have an auto &SS_BASE = IF_32BIT_ELSE(state.addr.ss_base.aword, zero1); kind of thing. Would this work for you?

I don't recall if things like PUSH and stuff have been properly implemented to bring in the segment.

tathanhdinh commented 5 years ago

Thanks @pgoodman. auto &SS_BASE = IF_32BIT_ELSE(state.addr.ss_base.aword, zero1) would work, but IMHO it's only a part of the story :( I think the more serious problem is addr = ir.CreateAdd(addr, segment) because in a segmented memory model, we cannot simply add add and segment to get the (logical) address, segment should be used to go to the corresponding segment descriptor.

pgoodman commented 5 years ago

Is the issue something to do with segment permissions? What would an ideal or at least "complete" solution look like? For example, what if instead of addr = ir.CreateAdd(addr, segment) where was something like: addr = ir.CreateCall(...) and then call a function like TRANSLATE_ADDRESS_<seg name>(memory, state, addr)?

pgoodman commented 5 years ago

Where this "address translation" function will, among other things, return addr + state.addr.XX_base.aword.

pgoodman commented 5 years ago

And we can then have something like DEF_ADDR_TRANSLATE macro that mirrors DEF_SEM in some ways.

tathanhdinh commented 5 years ago

Yes, your proposition (of using addr = ir.CreateCall(...)) is a perfect solution for this case.