lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.22k stars 143 forks source link

Lift the declared branch taken store for a given flow #697

Closed 2over12 closed 5 months ago

2over12 commented 5 months ago

Anvill does not use branch taken for intraprocedural flows and instead just switches on PC . Unfortunately we do rely on the btaken variable for a flow being lifted in the case of a conditional call ie. https://github.com/lifting-bits/anvill/blob/70209a8c3311cc97875605a137da210233fe9cd6/lib/Lifters/BasicBlockLifter.cpp#L232

Regardless since we compute these flow hints we should keep them consistent in pcode. We should test this against conditional control flow in anvill. Closes #694

tetsuo-cpp commented 5 months ago

Regardless since we compute these flow hints we should keep them consistent in pcode. We should test this against conditional control flow in anvill. Closes https://github.com/lifting-bits/remill/issues/694

I won't merge this yet since it sounds like it still needs testing. Feel free to merge when you're ready.

m4xw commented 5 months ago

Hi, just saw this PR and gave it a quick spin: image Looks good so far, the double conditions I had are gone now too, codegen is surprisingly quite a bit different in how it merges the tails now, bit weird but still looks correct on a quick look.

Also I have been meaning to take a look at anvill, quick question here: does it work with the Sleigh based backends in remill? What are the benefits of using anvill?

My use case will be for emulator AOT recompilation for games, so I am not sure how much i will benefit from it

2over12 commented 5 months ago

Conditional call in libc seems to work anvill side:

define i8 @func949680basic_block949684_22(ptr %stack, i32 %program_counter, ptr noalias nocapture %memory, ptr noalias nocapture %D12, ptr noalias nocapture %D8, ptr noalias nocapture %R1, ptr noalias nocapture %D13, ptr noalias nocapture %R9, ptr noalias nocapture %D14, ptr noalias nocapture %D15, ptr noalias nocapture %D11, ptr noalias nocapture %LR, ptr noalias nocapture %D9, ptr noalias nocapture %R10, ptr noalias nocapture %R0, ptr noalias nocapture %R8, ptr noalias nocapture %R6, ptr noalias nocapture %R5, ptr noalias nocapture %R3, ptr noalias nocapture %R4, ptr noalias nocapture %R2, ptr noalias nocapture %D10, ptr noalias nocapture %R11, ptr noalias nocapture %R7) local_unnamed_addr #7 !__anvill_basic_block_uid_md !2 {
sleigh_remill_instruction_function_e7db8.exit:
  %0 = load i32, ptr %LR, align 4
  %1 = load i32, ptr %R11, align 4
  %2 = icmp ne i32 %1, 0
  %3 = icmp ne i32 %0, 0
  %narrow.not = select i1 %2, i1 %3, i1 false
  br i1 %narrow.not, label %func949680basic_block949684_22lowlift.exit, label %.critedge

.critedge:                                        ; preds = %sleigh_remill_instruction_function_e7db8.exit
  %4 = call i8 @sub_e732c__AvB_B_0()
  br label %func949680basic_block949684_22lowlift.exit

func949680basic_block949684_22lowlift.exit:       ; preds = %sleigh_remill_instruction_function_e7db8.exit, %.critedge
  %5 = tail call i8 @func949680basic_block949700_23(ptr %stack, i32 %program_counter, ptr nonnull %memory, ptr nonnull %D12, ptr nonnull %D8, ptr %R1, ptr nonnull %D13, ptr nonnull %R9, ptr nonnull %D14, ptr nonnull %D15, ptr nonnull %D11, ptr nonnull %LR, ptr nonnull %D9, ptr nonnull %R10, ptr %R0, ptr nonnull %R8, ptr nonnull %R6, ptr nonnull %R5, ptr %R3, ptr nonnull %R4, ptr %R2, ptr nonnull %D10, ptr nonnull %R11, ptr nonnull %R7)
  ret i8 %5
}
2over12 commented 5 months ago

Hi, just saw this PR and gave it a quick spin: image Looks good so far, the double conditions I had are gone now too, codegen is surprisingly quite a bit different in how it merges the tails now, bit weird but still looks correct on a quick look.

Also I have been meaning to take a look at anvill, quick question here: does it work with the Sleigh based backends in remill? What are the benefits of using anvill?

My use case will be for emulator AOT recompilation for games, so I am not sure how much i will benefit from it

Anvill does use the sleigh remill backends. I dont think anvill is likely to be a good fit for your usecase because we cannot guarantee consistently correct recompilation inside of anvill. Anvill is trying to use brightening to produce simplified bitcode, similar to what you would see coming out of clang or the C written by a human. Since we are doing decompilation in anvill there are fundamental limitations that mean it cannot always succeed or recompile.

m4xw commented 5 months ago

Anvill does use the sleigh remill backends. I dont think anvill is likely to be a good fit for your usecase because we cannot guarantee consistently correct recompilation inside of anvill. Anvill is trying to use brightening to produce simplified bitcode, similar to what you would see coming out of clang or the C written by a human. Since we are doing decompilation in anvill there are fundamental limitations that mean it cannot always succeed or recompile.

I see, thanks. Further down the line i plan to use this in some decompilation projects too but so far its on the backburner, might be interesting if it manages to yeet the state structs.

Btw minor thing, for PPC I use custom sleigh definitions and the current ghidra-fork and src handling (with the generated patches) chokes if new definitions (files) are introduced that dont exist in the original ghidra sourcetree

Any recommendation you have to work around this / want me to create a separate issue?

I'd like to avoid having to keep syncing changes across 2 ghidra codebases