Open XVilka opened 4 years ago
Anton Kochkov writes:
See their documentation about the algorithm itself: [...]
TL;DR: "machine learning"
as you might expect from that fact, the article makes note that uC compilers have sufficiently different output to not actually be recognisable. I'm sure it's even worse for hand-written assembly programs.
I bet there's a better way. My first stupid idea would just be to disassemble a few chunks with many architectures and counting the number of "invalid"s.
Well, you can build some kind of heuristics too, indeed, without the relying on the ML's blackbox. It's the implementation details.
It is often possible to autodetect the architecture of the raw code, assuming it is the code. One of the examples (open source) on how it is possible to do through a statistical inference is https://github.com/airbus-seclab/cpu_rec
See their documentation about the algorithm itself: https://github.com/airbus-seclab/cpu_rec/blob/master/doc/cpu_rec_sstic_english.md
There are two more or less mature pure C libraries for machine learning:
So it is even possible to make this a part of the core.