lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.22k stars 143 forks source link

Don't mark instruction as "invalid" when we can't determine a control flow category #682

Closed tetsuo-cpp closed 10 months ago

tetsuo-cpp commented 11 months ago

Do not merge! This is just a starting point for discussion.

At the moment, when we can't determine a control flow category (happens with instructions like call, ret, etc which have branching in the middle of their PCode on DECOMPILE_MODE or didrestore), we just skip lifting this instruction. This happens here. I think that check needs to remain since we don't want to attempt to lift for a truly invalid instruction (i.e. we failed to even decode it). Therefore, the change needs to happen here so we categorise this situation differently.

This doesn't solve the problem since we now generate IR with conditional branching right in the middle of a basic block, meaning we start running into this:

E20230809 18:37:37.050169 2359682 Util.cpp:460] Error verifying function: Terminator found in the middle of a basic block!

@2over12 @Ninja3047 Did you have ideas on how to best handle this? This is a bit of an odd situation since the lifted PCode doesn't conform to a valid basic block anymore.

Ninja3047 commented 11 months ago
; Function Attrs: alwaysinline inlinehint nounwind
define internal ptr @sleigh_remill_instruction_function_40048b80(ptr %state, ptr %memory, ptr %btaken, ptr %npc) #10 {
entry_block:
  %G1 = getelementptr inbounds %struct.State, ptr %state, i32 0, i32 0, i32 4, i32 51, i32 0, !remill_register !1352
  %DIDRESTORE = getelementptr inbounds %struct.State, ptr %state, i32 0, i32 0, i32 19, !remill_register !1394
  %O7 = getelementptr inbounds %struct.State, ptr %state, i32 0, i32 0, i32 4, i32 47, i32 0, !remill_register !1350
  %MEMORY = alloca ptr, align 4
  store ptr %memory, ptr %MEMORY, align 4
  %"unique_ff00:1" = alloca i8, align 1
  br label %1

exit_block:                                       ; preds = %11, %1, %1
  %0 = load ptr, ptr %MEMORY, align 4
  ret ptr %0

1:                                                ; preds = %entry_block
  store i32 1074039680, ptr %O7, align 4
  store i8 0, ptr %DIDRESTORE, align 1
  %2 = load i32, ptr %G1, align 4
  %3 = or i32 %2, 0
  store i32 %3, ptr %O7, align 4
  store i32 1074039860, ptr %npc, align 4 // 0x40048c34
  br label %exit_block // terminator 1
  %4 = load i8, ptr %DIDRESTORE, align 1
  %5 = icmp eq i8 %4, 0
  %6 = zext i1 %5 to i8
  store i8 %6, ptr %"unique_ff00:1", align 1
  %7 = load i8, ptr %"unique_ff00:1", align 1
  %8 = trunc i8 %7 to i1
  %9 = load i32, ptr %npc, align 4
  %10 = select i1 %8, i32 1074039688, i32 %9 // 0x40048b88
  store i32 %10, ptr %npc, align 4
  br i1 %8, label %exit_block, label %11 // terminator 2

11:                                               ; preds = %1
  %12 = load i32, ptr %O7, align 4
  store i32 %12, ptr %npc, align 4
  br label %exit_block
}

fails verification on this block

        40048b80 40 00 00 2d     call       rtems_string_to_unsigned_long                    rtems_status_code rtems_string_t
                                                      o7 = COPY 0x40048b80:4
                                                      didrestore = COPY 0:1
                                                      o7 = INT_OR g1, 0:4
                                                      CALL *[ram]0x40048c34:4
                                                      $Uff00:1 = INT_EQUAL didrestore, 0:1
                                                      CBRANCH *[ram]0x40048b88:4, $Uff00:1
                                                      RETURN o7
Ninja3047 commented 11 months ago

we were terminating the instruction after calls/branches but in sparc, it's possible to have pcode ops after calls and branches

tetsuo-cpp commented 10 months ago

In the example I'm looking at, things are still looking broken. I'm investigating this atm but most of the code seems to be optimized out due to a poison value somewhere.

However, that's a separate issue so I think this is good to go. Marking as ready.