Closed steven-hh-ding closed 3 years ago
That sounds very exciting!
The recursive call
feature covers direct calls from within a function to itself.
In this case it looks like a recursive data reference, see IDA disassembly below.
.text:00455936 push offset sub_4556E5 ; int <--- data reference
.text:0045593B push dword ptr [edi+14h] ; int
.text:0045593E lea eax, [ebp+var_48]
.text:00455941 push dword ptr [edi+0Ch] ; uBytes
.text:00455944 mov dword_4B9EC4, 1
.text:0045594E push eax ; int
.text:0045594F push dword ptr [edi+4] ; int
.text:00455952 lea eax, [ebp+var_50]
.text:00455955 push eax ; int
.text:00455956 lea eax, [ebp+var_120]
.text:0045595C call sub_407FAB
Is it feasible for you to distinguish between code and data references?
Yah we saw that! In Ghidra they just count this as function call (since sub_407FAB called the pushed address) .. Let me check what can we do. If this is the case, also the calls to
characteristic does not apply to sub_4556E5
here as well, right? (sub_407FAB called sub_4556E5 through static data reference).
Ouh, that's a cool analysis feature. For consistency I'd lean towards not counting such "indirect" recursions. However, if the backend provides it, we should leverage that.
My cheap solution would be to modify the test case to a function with no ambiguity. Unless Willi picked this function on purpose for this case?! 😄
Same thing for the calls to
feature. Let's see what @williballenthin thinks.
Yah the analysis done by Ghidra is pretty cool. Not just this. It already did the nested data ref part so we don't need to loop.
So far this is the only issues we had. We just found a way to differentiate, but just were wondering if it is necessary.
(jv) D:\jvd>pytest
================================================================================================================================================= test session starts ==================================================================================================================================================
platform win32 -- Python 3.7.7, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: D:\jvd, configfile: pytest.ini
collected 232 items
...
test\test_ghidra_capa_features.py ........................................................................................................F.....F...F. [ 50%]
test\test_ida_capa_features.py ....................................................................................................................
====================================================================================================================================================== short test summary info =======================================================================================================================================================
FAILED test/test_ghidra_capa_features.py::test_jvd_ghidra_features[mimikatz-function=0x4556E5-characteristic(recursive call)-False] - AssertionError: characteristic(recursive call) should not be found in function=0x4556E5
FAILED test/test_ghidra_capa_features.py::test_jvd_ghidra_features[mimikatz-function=0x4556E5-characteristic(calls to)-False] - AssertionError: characteristic(calls to) should not be found in function=0x4556E5
FAILED test/test_ghidra_capa_features.py::test_jvd_ghidra_feature_counts[mimikatz-function=0x4556E5-characteristic(calls to)-0] - AssertionError: characteristic(calls to) should be found 0 times in function=0x4556E5, found: 1
============================================================================================================================================= 3 failed, 229 passed in 1049.97s (0:17:29) =============================================================================================================================================
Additionally, Ghidra doesn't work very well to find the right function start.. It can't find 0x1800202B0
..
In order to fix the disagreement on function boundaries from different disassemblers, we have to patch the get_function
method in fixtures to pass that test cases:
def get_function_jvd(extractor, fva):
for f in extractor.get_functions():
if f.addr_start == fva:
return f
for f in extractor.get_functions():
for b in f.blocks:
for i in b.ins:
if i.ea == fva:
return f
raise ValueError("function not found")
For different function boundary recognition we've tried to find functions in the past where all backends agree. However, since this can get very cumbersome I've added a fixup for a vivisect test case before.
Can you adopt something like this?
Regarding the Ghidra test cases, I'd be ok with you suggesting different functions with no ambiguity.
@steven-hh-ding if still relevant, do the changes in #545 address the issue here?
please reopen if this is still an issue
Description
Thanks for the great work! We are making a feature extractor on our open-source unified json interface for several backend disassemblers including Ghidra. There are now only two test cases failing. One of them is
recursive call
:Our question is, for Characteristic("recursive call"), does it mean directly or in-directly should be okay?
Ghidra actually reports a recursive call:
since sub_4556E5 -> sub_00407fab -> sub_4556E5:
In sub_4556E5:
In sub_00407fab, param6 is sub_4556E5:
Ghidra directly treats this as ref/xref so we have a tough time separating these two cases.
Steps to Reproduce
mimikatz
in Ghidra0x4556E5
Maybe we miss something here. Thanks a lot for your time!