radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.64k stars 3k forks source link

Automated architecture detection for raw binaries #15506

Open XVilka opened 4 years ago

XVilka commented 4 years ago

It is often possible to autodetect the architecture of the raw code, assuming it is the code. One of the examples (open source) on how it is possible to do through a statistical inference is https://github.com/airbus-seclab/cpu_rec

See their documentation about the algorithm itself: https://github.com/airbus-seclab/cpu_rec/blob/master/doc/cpu_rec_sstic_english.md

There are two more or less mature pure C libraries for machine learning:

So it is even possible to make this a part of the core.

hmht commented 4 years ago

Anton Kochkov writes:

See their documentation about the algorithm itself: [...]

TL;DR: "machine learning"

as you might expect from that fact, the article makes note that uC compilers have sufficiently different output to not actually be recognisable. I'm sure it's even worse for hand-written assembly programs.

I bet there's a better way. My first stupid idea would just be to disassemble a few chunks with many architectures and counting the number of "invalid"s.

XVilka commented 4 years ago

Well, you can build some kind of heuristics too, indeed, without the relying on the ML's blackbox. It's the implementation details.

XVilka commented 4 years ago

See also https://github.com/radareorg/radare2/issues/12040