Open psifertex opened 1 year ago
Yes, I think we are, although with this as a library, would that be OK to include in your plugins manager? Or would you prefer we instead add packaged tools that use this library?
I expect if you're good with the library that could still be useful to people for sure who build their own tooling on it after a plugin install. We just want to make sure the description is clear what it is/isn't.
CCing @jprokos26 for awareness
I could go either way, I'm mostly focused on raising awareness so just having it in there helps, but having a simple UI wouldn't hurt and I don't mind even submitting a PR with a simple one if that would help.
I've made a pretty simple plugin here: 79f075f
It requires hashashin to be installed for BinaryNinja's python.interpreter
and can either embed the extracted function feature map as a comment using Hashashin Feature Extraction
or can compute the full Binary Signature using Hashashin Signature Generation
which computes the features for every function and stores the computed signature object in the session data which can be accessed with bs = bv.session_data.BinarySignature
.
@psifertex Three main questions to make this a useful plugin:
The feature map currently looks like this and is stored as a comment at the top of the function:
{'cyclomatic_complexity': 1,
'num_instructions': 13,
'num_strings': 1,
'max_string_length': 14,
'vertex_histogram': [1, 2, 0],
'edge_histogram': [2, 0, 0, 0],
'instruction_histogram':
0|3|0|0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0,
'dominator_signature': 0x1a,
'constants': [1, 3, 510, 55952, 80308, 80312, 553101, 637184],
'strings': ['stack overflow']
}
Note that dominator_signature
can be a very large number (largest in busybox is on the order of 5.4E+185
) and constants can be quite long as well. Internally we wrap these values when pushing to disk but the object passed back to binja has the full length.
1) Yes, you just need to make a BackgroundTaskThread. Here's an example.
2) In terms of annotations, it depends on the goals. I would look at the BD Viewer plugin as an example of how you can present matched data and offer to, for example, port symbols or type information. In fact, the BSI project some other folks there have been working on has a pretty robust UI for doing similar workflows and might be worth taking a look at, though I don't believe the current implementation is available under an open source license but I wouldn't mind lobbying for that if it helps. 😉
3) No, unfortunately you CANNOT rely at all on MLIL being stable across versions. In fact, you can't even rely on it being stable in the same version! Given new type information or other changes such as functions being added or removed, analysis can easily change such that MLIL is not constant. Sometimes even depending on analysis races it's possible for changes to occur even without the above! This usually happens when analysis depends on the order of analysis of other functions and while we try to stamp out sources of non-determinism like this, we cannot guarantee they do not exist.
For this reason we generally recommend either pinning on specific features or dynamically computing specific IL offsets on demand.
The feature map itself looks fine, just so long as it has the ability to handle when the ground underneath it shifts somewhat. 😬
Let me know if I missed anything!
Are ya'll interested in making this available as a binary ninja plugin? You could actually specify a requirements.txt that would pull and install as a pip dependency right from the repo as I understand it. That would make it more discoverable in the plugin manager.