Open ameily opened 2 years ago
I'm seeing an example of this in the cat
coreutils program, in the fdadvise
function:
C Code
void fdadvise (int fd, off_t offset, off_t len, fadvice_t advice)
{
#if HAVE_POSIX_FADVISE
// ignore_value is a macro that expands to: (void)(X)
ignore_value (posix_fadvise (fd, offset, len, advice));
#endif
}
Disassembly
0x080495eb <main+859>: call 0x804a3d0 <fdadvise>
; .....
0x804a3d0 <fdadvise>: jmp 0x8049200 <posix_fadvise64@plt>
Recovered Bitcode
; linked.ll
; fdadvise function
define i64 @Func_804A3D0(%struct.CPUX86State* %0) trailofbits/binrec-prerelease#3 {
entry:
store i32 134521808, i32* @PC, align 4, !inststart !11
store i32 134517248, i32* @PC, align 4, !inststart !11 ; address of posix_fadvise64
ret i64 0, !lastpc !77
}
This has been confirmed to be a limitation of binrec because jmp
based indirect function calls are not implemented.
The ln
sample is failing to lift when performing a hard link because it does an indirect function call to linkat
from rpl_linkat
. These functions appear to be provided by gnulib
Dump of assembler code for function rpl_linkat:
# ... instructions ....
0x0804f12f <+479>: pop ebx
0x0804f130 <+480>: pop esi
0x0804f131 <+481>: pop edi
0x0804f132 <+482>: pop ebp
0x0804f133 <+483>: jmp 0x8049160 <linkat@plt>
I'm seeing this on dd
as well, with the rpl_fclose
function:
Dump of assembler code for function rpl_fclose:
# ... instructions ...
0x08050b34 <+132>: pop ebx
0x08050b35 <+133>: pop esi
0x08050b36 <+134>: pop edi
0x08050b37 <+135>: jmp 0x80490c0 <fclose@plt>
This is also occurring on expand
:
Dump of assembler code for function rpl_fflush:
# ... instructions ...
0x0804bc93 <+35>: pop ebx
0x0804bc94 <+36>: jmp 0x8048c80 <fflush@plt>
Interestingly, with all the updated to S2E and binrec, the insert_calls
pass is now correctly identifying indirect function calls and bailing early.
binrec.errors.BinRecLiftingError: failed to perform initial lifting of LLVM bitcode:
cat: [insert_calls] false && "This is not implemented yet.
Also, is metadata set correctly with indirect calls?"
Previously, this wasn't being detected and, instead, the recovered binary would crash.
Probably because we replaced all the assert statements with honest to goodness errors.
It looks like the function being indirectly called is removed in the recover_functions
pass. My initial thought was to update insert_calls
, which runs after recover_functions
, to unconditionally branch to the target function / basic block. However, the target function/BB does not exist anymore.
Do you know why it's being removed? IT may be because there is not trace info linking the calling block to the indirectly called function.
If so, we would need to figure out why the S2E plugins are not catching the indirect call (or if they are, why are they not adding this to the TraceInfo.)
Do you know why it's being removed?
It looks like the function that is being called indirectly is merged into the body of the function performing the indirect call.
; 0x804a4d0 is the wrapper function which performs an indirect call
define internal void @Func_804A3D0() {
BB_804A3D0:
store i32 134521808, i32* @PC, align 4, !inststart !26
store i32 134517248, i32* @PC, align 4, !inststart !26
ret void, !lastpc !278, !succs !279
; 0x8049200 is the target function that is indirectly called
; in this scenario, this function is posix_fadvise
BB_8049200: ; No predecessors!
store i32 134517248, i32* @PC, align 4, !inststart !26
store i32 134517248, i32* @PC, align 4, !inststart !26
%tmp0_v.i = call i32 @helper_ldl_mmu(%struct.CPUX86State* null, i32 134566148, i32 33, i8* null)
store i32 %tmp0_v.i, i32* @PC, align 4
ret void, !lastpc !280, !succs !72, !extern_symbol !281
BB_8049200_join: ; No predecessors!
}
I think the cat
sample is very specific because the Func_8049200
function is actually a library function (posix_fadvise
). So, the recover_functions
pass is incorrectly merging the stub function, which performs an indirect function call, and the library function being indirectly called.
I've confirmed that recover_functions
appears to be treating jmp
based indirect function calls as part of the function body, which is an issue when the function being indirectly called is in an external library. The cat
sample has 3 instances of this:
rpl_fclose @ 0x0804fbc7 <+135>: jmp 0x8048ec0 <fclose@plt>
rpl_fflush @ 0x0804fc1f <+79>: jmp 0x8048e50 <fflush@plt>
fdadvise @ 0x0804a3d0 <+0>: jmp 0x8049200 <posix_fadvise64@plt>
The jmp
target is listed in the function info entry_pc_to_bb_pcs
list. For example, the rpl_fclose
function has a BB list of:
Func_804FB40 # entry point
Func_804FB40
Func_804FB50
Func_804FB57
Func_804FB60
Func_804FB67
Func_804FB70
Func_804FBC0 # ends with "jmp fflush"
Func_8048EC0 # fflush in libc.so
The result is the entire rpl_fclose
function being recovered, including the body of the library fflush
function. So, instead of a call fflush
, it's essentially statically compiled into the binary (albeit in a broken way because the function will fail to lift).
My hunch is that I need to first update recover_functions
to not merge functions when one is external and one is not, which I believe would indicate a jmp
based function "call".
Binrec does not support lifting tail calls to external functions and, for the time being, this issue will not be addressed.
The coreutils cat
sample has a tail call to the libc function posix_fadvise
.
C Code
void fdadvise (int fd, off_t offset, off_t len, fadvice_t advice) {
posix_fadvise (fd, offset, len, advice);
}
Disassembly
Dump of assembler code for function fdadvise:
0x0804a3d0 <+0>: jmp 0x8049200 <posix_fadvise64@plt>
End of assembler dump.
Captured Bitcode
; linked.ll
; Function Attrs: alwaysinline
define i64 @Func_804A3D0(%struct.CPUX86State* %0) trailofbits/binrec-prerelease#3 {
entry:
; internal function: fdadvise()
store i32 134521808, i32* @PC, align 4, !inststart !12
; 134517248 == 0x8049200
; jmp 0x8049200 (posix_fadvise)
store i32 134517248, i32* @PC, align 4, !inststart !12
ret i64 0, !lastpc !78
}
; Function Attrs: alwaysinline
define i64 @Func_8049200(%struct.CPUX86State* %0) trailofbits/binrec-prerelease#3 {
entry:
; libc posix_fadvise function
store i32 134517248, i32* @PC, align 4, !inststart !12
store i32 134517248, i32* @PC, align 4, !inststart !12
%tmp0_v = call i32 @helper_ldl_mmu(%struct.CPUX86State* %0, i32 134566148, i32 33, i8* null)
store i32 %tmp0_v, i32* @PC, align 4
ret i64 0, !lastpc !79
}
After merging, the function looks like:
define i64 @Func_804A3D0(%struct.CPUX86State* %0) trailofbits/binrec-prerelease#3 {
entry:
; internal function: fdadvise()
store i32 134521808, i32* @PC, align 4, !inststart !12
; 134517248 == 0x8049200
; jmp 0x8049200 (posix_fadvise)
store i32 134517248, i32* @PC, align 4, !inststart !12
ret i64 0, !lastpc !78
BB_8049200:
; libc posix_fadvise function
store i32 134517248, i32* @PC, align 4, !inststart !12
store i32 134517248, i32* @PC, align 4, !inststart !12
%tmp0_v = call i32 @helper_ldl_mmu(%struct.CPUX86State* %0, i32 134566148, i32 33, i8* null)
store i32 %tmp0_v, i32* @PC, align 4
ret i64 0, !lastpc !79
}
The merged function fails to lift for multiple reasons.
There are multiple limitations and gaps within Binrec that would need to be addressed to support tail calls into libraries:
jmp
to a library function, so the target function body is being traced in this scenario.extern
and LLVM does not support branching to an extern
function, the function must be called. However, the original arguments would've been inlined and potentially lost.
binrec does not support indirect function calls and jumps. Specifically, two samples are failing to lift: longjmp and siglongjmp, and it appears to be related to the insert_calls.cpp pass.