Open TcM1911 opened 2 years ago
thanks for the thorough report here. likewise, i have a suspicion thats its a bug in the underlying analysis engine (vivisect). we're awaiting some substantial improvements to be released in the next few days, after which we'll release v3.0.1.
i will see if capa behaves better using a snapshot of vivisect from master. if that doesn't work, i'll dig into why it doesn't identify the API calls.
incidentally, to confirm the features that capa extracts, you can use the script show-features.py.
using viv/master (https://github.com/vivisect/vivisect/commit/d5b895eeeb19a0304d965cbf4fc86a5f3a5183c5) doesn't help.
using show-features we do see the import of popen
:
but, actually, it doesn't look like capa identifies the function:
looking in IDA, the function is referenced by pointer and then maybe added to a list:
i suspect the viv analysis doesn't recognize that the pointer is to a function (rather than data) so capa doesn't have a chance to analyze this function.
the function in question has a pretty clear prolog:
so i wonder if we can extend the viv analysis a bit further to recognize the function pointers.
in vivisect funcentries we see the brute forcing of function entry points using the i386 signatures from here. notably, these signatures do not match the prolog we see here.
some options:
notably, we can implement these analysis passes outside of viv, within capa, while we evaluate their effectiveness.
Looking at the sample, it appears that it's been compiled with the -fomit-frame-pointer
flag.
-fomit-frame-pointer Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.
Starting with GCC version 4.6, the default setting (when not optimizing for size) for 32-bit GNU/Linux x86 and 32-bit Darwin x86 targets has been changed to -fomit-frame-pointer. The default can be reverted to -fno-omit-frame-pointer by configuring GCC with the --enable-frame-pointer configure option.
Enabled at levels -O, -O2, -O3, -Os.
This can make it hard to detect functions via a signature because the beginning can be different. Here are some function prologs from the same file:
| 0x00404e80 4155 push r13
│ 0x00404e82 4154 push r12
│ 0x00404e84 4989fc mov r12, rdi ; arg1
│ 0x00404e87 55 push rbp
│ 0x00404e88 53 push rbx
│ 0x00404e89 4881ec580400. sub rsp, 0x458
│ 0x004038f0 55 push rbp
│ 0x004038f1 53 push rbx
│ 0x004038f2 4889fb mov rbx, rdi ; arg1
│ 0x004038f5 4883ec18 sub rsp, 0x18
│ 0x0040f060 53 push rbx
│ 0x0040f061 89fb mov ebx, edi ; arg1
No prolog
│ 0x0040ecc0 89f0 mov eax, esi ; arg2
│ 0x0040ecc2 c1e818 shr eax, 0x18
│ 0x0040ecc5 85d2 test edx, edx ; arg3
│ 0x0040ecc7 7417 je 0x40ece0
It looks like it can be: callee saving registers, subtracting the stack pointer, or non of it. It looks like only short functions doesn't have the push
or sub
instructions. Maybe it can be assumed to be a function if it points into the text section and a first set of instructions are decoded. If it results into a subtraction of the stack pointer, we could lock it in as a function. If we don't hit a sub
instruction, we can decode a maximum of bytes until a ret
in case it's a short function that doesn't use the stack.
I'm just spitballing ideas for detecting these functions without a run-off decoding.
@williballenthin we just discovered a bug in Viv's ELF parsing that may help with some of these undiscovered functions as well. stay tuned! i have to run it through the PR process and clean up the unittests that are bound to break.
@
There's still some issues with the "
api
" feature detection. I think this issue is also due to a bug in vivisect. See this file for example.Few are detected but if we check the "imports" there should be a few more rules that got triggered. We can see that the sample imports both "uname" and "popen".
This is a snippet where the function calls popen on the last line.
The rule
create process on Linux
is very simple and should have fired. I can't see any other reason than the symbols not being discovered correctly being at fault.It's not caused by the
os
requirement. If I modify the rule to:It still doesn't fire, while working fine on other samples: