Open mr-tz opened 3 years ago
Concerns (from last meeting):
or
CreateFile
when referring to an argument, we should be able to refer to its specific index. we should also try to associate the argument with its declared name. so like:
api: CreateFileA
arg[0]: "foo.exe"
and
api: CreateFileA
lpName: "foo.exe"
how do we maintain these mappings? we'd need a database of APIs and their canonical argument names (ideally should match MSDN (windows) and man pages (posix)).
for MSDN, we should consider extracting the info we need from M$ provided winmd files: https://github.com/microsoft/win32metadata alternatives might include using viv's API database or extract one from some sandbox, etc. but the winmd approach is "blessed" and supported.
we should push to have https://github.com/vivisect/vivisect/pull/213 updated and merged.
we'll need to figure out how to handle a subset of types commonly used for arguments, like pointers to strings.
does specifying a value as a string, like lpName: "foo.exe"
imply the argument is a string (either ASCII or utf-16) and instruct the matching engine to resolve the data? and/or does the engine use an API database to determine the types of arguments ahead of time?
we should probably not go too far down this rabbit hole; handling structures is likely out of scope.
do we support regex against strings?
thought: if we migrate most of our rules to use this feature, then we could probably natively support decompiler backends, like ghidra and hex-rays.
we should consider the fragmentation of our analysis backends though. how do we handle the scenario when some backends do/n't support various features? we already almost see this with SMDA versus viv wrt FLIRT support.
we could add this as part of capa 4.0 (probably introduces insn scope) or defer for 5.0+ as this will be a breaking change to rule syntax.
via https://github.com/mandiant/capa/pull/930#issuecomment-1083795849 and above
probably want to support at least the following "types":
- operand[{0,1,n}].number: ...
- operand[{0,1,n}].string: ...
- operand[{0,1,n}].substring: ...
- operand[{0,1,n}].bytes: ...
- operand[{0,1,n}].flag: ...
master's thesis https://www.ru.nl/publish/pages/769526/joren_vrancken.pdf by @joren485 describes an IDA/Hex-Rays plugin that uses call-scope features to identify capabilities. they have good success, demonstrating that this is probably a useful addition to capa.
notably they use Hex-Rays decompilation as the source of their features.
one suggestion for this feature's syntax would be to use a format similar to the strace and ltrace utilities on Linux. Example:
- api: CreateThread(lpThreadAttributes=0x0, dwStackSize=, lpStartAddress=, lpParameter=, dwCreationFlags=0x4, lpThreadID=)
or maybe:
- api: CreateThread(lpThreadAttributes=0x0, dwCreationFlags=0x4) # match just these two arguments
we can also specify return values in this syntax similar to strace/ltrace:
- api: IsDebuggerPresent() == 0
the downsides to this approach are:
upsides of this approach:
api: CreateThread(lpThreadAttributes=0x0, dwCreationFlags=0x4)
i do like some aspects of this syntax, particularly that its very human readable. human readability has always been a big goal for capa rule syntax. if we ultimately pick another solution, perhaps we can still support a shorthand like this, since its probably sufficient for many rules.
some additional considerations:
this OR that
. but i think its on us to demonstrate if this would be used often. i think maybe it might for bitfield/enum arguments.0x4 = CREATE_SUSPENDED
? maybe like dwCreationFlags=0x4 (CREATE_SUSPENDED)
or something?how do we maintain these mappings? we'd need a database of APIs and their canonical argument names (ideally should match MSDN (windows) and man pages (posix)).
If you are interested and if this is still relevant, I can provide an SQLite database containing API call definitions for Windows including their argument names. I scraped this information from the from the MSDN Offline Library 2009 back in 2019. So, the data basis is not the newest but should include the most relevant API calls.
However, this is an important point and should not be underestimated. The API traces differ greatly in terms of conformance to the MSDN. Based on my experience so far, CAPE has its own naming for arguments and the conformance is not the best. VMRay does a better job but I can fully understand that you chose CAPE since it is open source and there is a large data set of API traces available. The example shown below illustrates the differences in terms of the conformance. Please consider that these samples do not origin from the same sample.
CAPE (Sample 17beca96e3a7474622f5b23ff015c8783c0868a070cc5331db622de9b78dd45e from the avast repo):
{
"timestamp": "2021-06-03 21:57:55,843",
"thread_id": "1688",
"caller": "0x743c1321",
"parentcaller": "0x743c13c9",
"category": "registry",
"api": "RegOpenKeyExW",
"status": true,
"return": "0x00000000",
"arguments": [
{
"name": "Registry",
"value": "0x80000002",
"pretty_value": "HKEY_LOCAL_MACHINE"
},
{
"name": "SubKey",
"value": "system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder"
},
{
"name": "Handle",
"value": "0x000000e8"
},
{
"name": "FullName",
"value": "HKEY_LOCAL_MACHINE\\system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder"
}
],
"repeated": 0,
"id": 39
}
VMRay (Sample c0832b1008aa0fc828654f9762e37bda019080cbdd92bd2453a05cfb3b79abb3):
[0076.435] RegOpenKeyExW (in: hKey=0x80000001, lpSubKey="Software\\Microsoft\\Windows\\CurrentVersion\\Run", ulOptions=0x0, samDesired=0xf003f, phkResult=0x18ea40 | out: phkResult=0x18ea40*=0x4f0) returned 0x0
Ouh, that seems like a very important point.
As a rule author I'd like to specify the name instead of a number (which name though? likely the one the sandbox uses which could be different as shown above OR the name from the MSDN documentation).
To match features (using multiple sandboxes) we'd want to focus on the arguments by number (mapped from the name).
So, for now it may be easiest to just use numbered arguments? And then add our own mapping later, potentially based on @0x534a's data.
note that in the example above from @0x534a, the two sandboxes doen't even recover the same number of arguments 🤦🏼
i guess each sandbox needs a database to map argument names back to argument indices. then capa can work with raw indices. capa can optionally also provide its own database of argument index <-> argument name to make rules more readable, such as the one that @0x534a offers.
maintaining these databases will be a bit tedious, but im not sure how we can get around it. i suppose once they're built and tested, updates shouldn't often be needed unless the sandboxes change.
we'll have to inspect the types of data emitted by the sandboxes for the arguments as well. i suspect there'll be some cases where one sandbox resolves a handle into some string (e.g., path) and another sandbox just gives the handle value. fun.
regarding the different number of arguments for RegOpenKeyExW
, it seems like that's how CAPE was programmed to handle that:
If we're going to create and maintain a mapping from CAPE argument names into msdn naming, then I propose we reach out to the CAPE team and see if we could work on updating the CAPE argument names into the msdn format there.
alternatively, perhaps we could add a modifier to the arguments feature to specify which calling convention the rule author has in mind? so something like this:
- call:
- api: RegOpenKeyExW
- arguments/cape:
Registry: HKEY_LOCAL_MACHINE
SubKey: system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder
and maybe consequently this?
- call:
- api: RegOpenKeyExW
- or
- arguments/cape:
Registry: HKEY_LOCAL_MACHINE
SubKey: system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder
- arguments/msdn:
hkey: 0x80000001
lpSubKey: Software\\Microsoft\\Windows\\CurrentVersion\\Run
we reach out to the CAPE team and see if we could work on updating the CAPE argument names into the msdn format there
+1 one that idea
I'm not a fan of the sandbox specific arguments. I think it would make rule writing and our code more complex and complicated than desired.
I am all for updating the argument names to MSDN format within CAPE 👍
It might be worth noting that CAPE sometimes enriches the output by adding fields that are technically not API arguments.
For example, the output from the NtReadFile
hook includes the file path but this is not included in the arguments, rather is obtained by the hook from the handle argument.
@0x534a, would you mind sharing your database? This could help to get the names updated in CAPE.
I am all for updating the argument names to MSDN format within CAPE 👍
Yeah, that's pretty awesome and very appreciated! 🎉
@0x534a, would you mind sharing your database? This could help to get the names updated in CAPE.
The SQLite database can be downloaded from my OneDrive using the link https://1drv.ms/u/s!AqNdbwsLZ9qwgw7Z5izJe0OZg9t_?e=badlPF. The structure of the database is not too complex and should mostly be self-explanatory. For example, to search for all arguments of a given API call (in this case RegOpenKeyEx
) you can use the following SQL statement:
SELECT a.name AS api_function,
p.name AS argument_name,
t.name AS argument_type,
p.is_in,
p.is_out,
p.description
FROM api_calls a,
api_call_params p,
types t
WHERE p.api_call_id = a.id
AND p.type_id = t.id
AND a.NAME = "RegOpenKeyEx"
AND a.target_os = "windows"
ORDER BY p.id ASC;
Some constraints:
api_call_params
).windows
or the calling convention WINAPI
.If there are any question, I'm happy to help.
Great, thank you very much!!
Summary
Can we create a way to associate function arguments (mostly for numbers and strings) with calls to known functions?
Possible syntax:
See discussion in #921 around syntax.
This is easier to understand by humans and we can be a little smarter in the analysis phase.
We should restrict this feature to analysis engines/formats/runtimes for which we can reliably extract the arguments (like .NET). Then, when its working well, we can try to backport to other engines/formats/runtimes (like x86). TBD if this sort of analysis is expected by all backends, e.g. SMDA.
Motivation
Looking for examples for #767 reminded me of the other most common use case for
basic block
subscopes...Grouping function calls and their arguments, like
or