static-analysis-engineering / CodeHawk-Binary

CodeHawk Binary Analyzer for malware analysis and general reverse engineering
MIT License
24 stars 10 forks source link

More callgraph fixes #127

Closed waskyo closed 7 months ago

waskyo commented 10 months ago

Two fixes:

The core issue is that some nodes use the function name as their identifier and some use the function address. But if we also have the function name for the latter, and that is what the user requests, things get really confusing.

This adds the notion of nodes having multiple IDs and makes the code use that (although i'm not 100% sure i covered all the cases). One big gaping hole in this fix are edges. They're stored using only the strings, so checking for edge existence is non-trivial as you need to see if any of the node ids for each node matches the source or destination.

This also fixes an issue where user functions are first seen as so stubs and then we see the actual function. Before we would store both as nodes, one of them using the function name (which is all that stubs have) and another one using the function's address as its identifier. Once I switched things (as described above) this bug surfaced. The fix is to check for this particular case, and replace the existing stub node with the user function node.