joxeankoret / diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.
http://diaphora.re
GNU Affero General Public License v3.0
3.51k stars 371 forks source link

Research ideas #251

Open joxeankoret opened 1 year ago

joxeankoret commented 1 year ago

Feel free to put your feature requests that require some research here (take a look to the below list to see what I mean):

silveroxides commented 1 year ago

Hope this makes sense as a feature. Something I have noticed in pretty much every tool, plugin or similar meant for comparing binaries while also allowing for importing names and such is that they all for some reason fail to take in to consideration the relative location of the matched function to other functions. This is especially noticeable when using it for ARM architecture ELF binaries where every single tool have just created a mess of the entire database and even after trying to solve it by trying to adjust settings for the diffing it does the same. Unsure if this would fit as a comment here but a feature to set it to weigh heavily to the ordering of functions in database as well as being able to set it to not break Class location in the database if the naming scheme uses proper C++ mangling which mine does. Example for the mangling in my databases "_ZN5Class8FunctionEP8Variablebfilv" which becomes "Class::Function(Variable*, bool, float, int, long, void)"

joxeankoret commented 1 year ago

About the relative position for functions, in the currently in development version of Diaphora I'm using the concept of compilation units to try to workaround this problem. In case you are curious, compilation units boundaries are guessed using the old version of CodeCut's LFA (Local Function Affinity) algorithm and also IDA Magic Strings to extract and use (when it's available) the source code file names from debugging strings. Basically, if CodeCut says "there is a compilation unit from this address to this address", and then IDAMagicStrings says "these functions belongs to this source code", Diaphora will take the minimum address assigned by either IDA Magic Strings or CodeCut, as well as the maximum address found by any of these two methods, and create a single compilation unit for them. Compilation units are them, later on, used in Diaphora heuristics in various ways to favour matching functions in the same compilation unit instead of matching in random different areas. However, we don't always have meaningful enough information as to do this properly.

About mangled and unmangled names, Diaphora uses both and, if I remember correctly, it handles both cases properly, but I will put this review task in my to-do and verify at some point. The development of Diaphora 3.0 is still going on and it will take me quite some time yet to finish it, as I do it in my spare time (which is fine and fun).

silveroxides commented 1 year ago

CodeCut's LFA was new to me but IDA Magic Strings is something I am familiar with. Used it quite a lot to get an idea of what type libraries to create or if I should create a FLIRT signature using flair.

As for IDA Magic Strings determining what functions belong to what source code, is that separate from how it generates Candidate Function names?

The candidates it suggests have been quite hit and miss. Especially since it on several occasions presented candidates based on a single .rodata refererence (example from what I am looking at now it suggests "sinherit" (string in rodata="%sinherit" ) and only giving a one word lower case candidate while the function also does another function call to a function named "CRYPTO_free" and one other rodata xref in the function named "CryptoX509v3V3". This lead me to assume the proper name for the Class could be CryptoX509v3 and function SubjectInheritance. (EDIT: Found source of the function I looked at by searching "%sinherit" at Google and it gave me the source file directly downloadable as first result(ASIdentifierChoice_inherit). )

In the other Issue i participated in that were about Diaphora freezing during certain heuristics I mentioned a plugin for creating signatures that I have had good success with and it was not the first one I tried but the only one that would produce proper wildcard signatures that were consistent not only between databases with recent version change but also several years in between and developer going from using old ndk-build for compiling and not stripping symbols in one version and matching pattern in future version where developer changed to current standard for compiling native C shared libraries on Android and stripped symbols. In the plugin code there is mention of adjustments for ARM which seems to be one reason that "SigMaker" works. My favorite feature it has though is the ability to generate patterns for xrefs. Makes keeping track of .bss variables a lot easier. Link to the one I mentioned: IDA-Pro-SigMaker And it works from IDA Pro 7.7 if built using the IDA SDK for 7.7.

joxeankoret commented 1 year ago

IDA Magic Strings in Diaphora is only used to get potential source file names, not function names. And, yes, it's normal that it fails a lot: it's using debugging strings after all.

silveroxides commented 1 year ago