pgoodman commented 4 years ago

The ELF thunk recognition code of McSema should be copied and adapted for Anvill so that if a function references an ELF thunk, then we go and follow through and find the referenced external and use its name in the prototype, rather than the name of the function itself, which may be prefixed with junk.

That is, instead of a prototype of this function having the name _signal or .signal:

We should instead follow through to the .plt segment...

And take the info from here:

The relevant code to adapt from McSema is:

https://github.com/lifting-bits/mcsema/blob/master/tools/mcsema_disass/ida7/get_cfg.py#L334-L466

pgoodman commented 3 years ago

Here is a rough example of the patterns we want to recognize, and how we want to "re-interpret" them:

Redirection-based patterns

The following patterns describe "code redirection" patterns, that is, where we want to orchestrate the lifting of control-flow to redirect control flow to something other than what is actually in the binary. In practice, the idea is to redirect to the intended target, rather than the actual mechanical target.

Pattern 1:

call [__libc_start_main@plt]

Replacement 1

call __libc_start_main

From an LLVM standpoint, this means the following:

We have a remill::Instruction with kCategoryIndirectCall or something like that. Normally this triggers calling the CALL semantics function, then making a function call to __remill_function_call.
Consult a "redirection" table that tells us that this call instrution re-directs execution to the external __libc_start_main, and call its lifted function instead of calling __remill_function_call.

Pattern 2

jmp [printf@plt]

Replacement 2

jmp printf

From an LLVM standpoint, this is similar to pattern and replacement (1). We want to have the same kind of redirection entry. Here, isntead of of lifting this as a call to the semantics, followed by a __remill_jump, we want to lift it as a call to semantics, followed by a terminating tail call to the lifted external printf.

Pattern 3

_printf:
  jmp [printf@plt]

foo:
  ...
  call _printf

Replacement 3

foo:
  ...
  call printf

This is similar to (1), but instead of a calling to the lifted version of the internal _printf, we want to redirect execution to the lifted external printf.

Pattern 4

_printf:
  jmp [printf@plt]

foo:
  ...
  jmp _printf

Replacement 4

foo:
  ...
  jmp printf

Similar to pattern 3, but using a terminating tail call redirection.

Relocation-based patterns

The following patterns describe data relocation-based patterns. This means operating on the actual operands of a lifted instruction, and substituting them with something else. Here are some examples of what we want to deal with.

Pattern 1

mov rax, [__libc_start_main@plt]
call rax

Replacement 1

tmp = alloca
store __libc_start_main, tmp
state->rax = load tmp

This one is tricky. We want a relocation entry that says that a memory load of the address __libc_start_main@plt will load the address of the external __libc_start_main. By extending the instruction lifter class, in a nearly identical way to McSema, we can interpose on the operands and look at if they are used for memory reads or address generation, then try to figure out the effective loaded address, and identify if a relocation applies. If a relocation applies, then we want to invent a new address to be loaded, based off of an alloca that we pre-fill with the address of the external __libc_start_main.

artemdinaburg commented 3 years ago

I think we can finally close this.

pgoodman commented 3 years ago

We can't close just yet. Parts of this issue are done, but not all parts. What remains to be done:

Indirect calls of the form call [__libc_start_main]. This means adding control-flow target support into FunctionLifter::VisitIndirectFunctionCall
Doing everything, but for the conditional variants.

lifting-bits / anvill

ELF external thunk recognition #5

Redirection-based patterns

Pattern 1:

Replacement 1

Pattern 2

Replacement 2

Pattern 3

Replacement 3

Pattern 4

Replacement 4

Relocation-based patterns

Pattern 1

Replacement 1