radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.68k stars 3k forks source link

Cross-comparison functions with different names in Radiff2 #6346

Open DIV-submission opened 7 years ago

DIV-submission commented 7 years ago

Hello,

I am thinking to use radiff2 for some malware sample similarity analysis, and by reading its source code, I figured out that radiff2 only compares functions with the same name (am I understanding it correctly here?) As shown in function r_anal_diff_fcn

That is, for example I have two malware sample M1 and M2, each of which has two functions func1 and func2. Then it seems to me that radiff2 will compare the similarity of two func1 and decide the match/unmatch, then do the same thing to func2.

However, IMHO, don't you think it would be more reasonable to do a cross-comparison, ignoring the function name? That is, first use func1 in M1 to compare with func1 and func2 in M2, decide the similarity rate regarding each one, then compare func2 in M1 towards func1 and func2 in M2.

Isn't it more reasonable? Am I missed anything here? Any advice and suggestion would be appreciated, thank you!

radare commented 7 years ago

yep makes sense to ignore names, in fact that's what it's done when function have no name iirc

kuqadk3 commented 4 years ago

IDK but I just take a look at the code and from what I saw, it just ignores when two functions have the same name and do the comparison later

https://github.com/radareorg/radare2/blob/5e00b6118f01a88cef308e1255ae639e1d4b158f/libr/anal/diff.c#L224-L225

and only do the comparison when two functions have a different name.

https://github.com/radareorg/radare2/blob/5e00b6118f01a88cef308e1255ae639e1d4b158f/libr/anal/diff.c#L194-L195

Why not just ignore the name, though I understand that some tools used function's name as a factor combined with other factors to do partial match (like diaphora)

https://github.com/joxeankoret/diaphora/blob/44dfc7d31401f94764f1f62f2de626527c7fb29c/doc/heuristics.html#L27

kuqadk3 commented 4 years ago

Is this still a problem @DIV-submission , @radare ? If it's still a problem, should we implement a new option where user provides two function name then we do the diff as suggested in https://github.com/radareorg/radare2/issues/6448 or should we fix the old code?

kuqadk3 commented 4 years ago

@XVilka can you take a look?

XVilka commented 4 years ago

@kuqadk3 we can introduce an option to switch between these modes. @radare what do you think?

Another approach would be introduce confidence/probability levels, similar to https://github.com/radareorg/radare2/issues/14594

radare commented 4 years ago

i dont think this is a mode, maybe better to name those different diffing algorithms and allow to pass opptions to them, like weights and such. some kind of -D zign,opt=1,opt2=2,...

On 3 Mar 2020, at 05:21, Anton Kochkov notifications@github.com wrote:

@kuqadk3 https://github.com/kuqadk3 we can introduce an option to switch between these modes. @radare https://github.com/radare what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radareorg/radare2/issues/6346?email_source=notifications&email_token=AAG75FTGNU6I6C5DRCI27VTRFSATTA5CNFSM4C2DRZ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENSBAAA#issuecomment-593760256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG75FXIUUHTASZ77FDKC5TRFSATTANCNFSM4C2DRZ4Q.

kuqadk3 commented 4 years ago

Hi, I have checked and figured out that r_anal_diff_fcn() is called from r_core_gdiff_fcn() which was called from cmd_anal and cmd_cmp image

The problem of that function (and this issue) does not have anything to do with radiff2.

But able to diff two functions with different name is a good feature for radifff2, so I checked and find out there is the only option 'g' in radiff2 took in function's name as an argument, we can modify 'g' mode in radiff2 to accept two function's name instead of one and only diff function with the same one name like now...or maybe I missed an option in radiff2 that also take in function's name as an argument to diff

XVilka commented 4 years ago

@kuqadk3 good idea, go for it.

radare commented 4 years ago

What do you mean

On 31 Mar 2020, at 01:20, kuqadk3 notifications@github.com wrote:

 Is there a way to pass two args into one option? @radare @XVilka

For ex :

radiff2 -g main1 main2 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.