pfalcon / ScratchABlock

Yet another crippled decompiler project
https://github.com/EiNSTeiN-/decompiler/issues/9#issuecomment-103221200
GNU General Public License v3.0
104 stars 23 forks source link
data-flow-analysis decompiler program-analysis reverse-engineering

Build Status

Q: Why is there a need for yet another decompiler, especially a crippled one?

A: A sad truth is that most decompilers out there are crippled. Many aren't able to decompile trivial constructs, others can't decompile more advanced, those which seemingly can deal with them, are crippled by supporting only the boring architectures and OSes. And almost every written in such a way that tweaking it or adding a new architecture is complicated. A decompiler is a tool for reverse engineering, but ironically, if you want to use a typical decompiler productively or make it suit your needs, first you will need to reverse-engineer the decompiler itself, and that can easily take months (or years).

How ScratchABlock is different?

The central part of a decompiler (and any program transformation framework) is Intermediate Representation (IR). A decompiler should work on IR, and should take it as an input, and conversion of a particular architecture's assembler to this IR should be well decoupled from a decompiler, or otherwise it takes extraordinary effort to add support for another architecture (which in turn limits userbase of a decompiler).

Decompilation is a complex task, so there should be easy insight into the decompilation process. This means that IR used by a decompiler should be human-friendly, for example use a syntax familiar to programmers, map as directly as possible to a typical machine assembler, etc.

The requirements above should be quite obvious on their own. If not, they can be learnt from the books on the matter, e.g.:

"The compiler writer also needs mechanisms that let humans examine the IR program easily and directly. Self-interest should ensure that compiler writers pay heed to this last point."

(Keith Cooper, Linda Torczon, "Engineering a Compiler")

However, decompiler projects, including OpenSource ones, routinely violate these requirements: they are tightly coupled with specific machine architectures, don't allow to feed IR in, and oftentimes don't expose or document it to user at all.

ScratchABlock is an attempt to say "no" to such practices and develop a decompilation framework based on the requirements above. Note that ScratchABlock can be considered a learning/research project, and beyond good intentions and criticism of other projects, may not offer too much to a casual user - currently, or possibly at all. It can certainly be criticised in many aspects too.

Down to Earth part

ScratchABlock is released under the terms of GNU General Public License v3 (GPLv3).

ScratchABlock is written in Python3 language, and tested with version 3.3 and up, though may work with 3.2 or lower too (won't work with legacy Python2 versions). There're a few dependencies:

On Debian/Ubuntu Linux, these can be installed with sudo apt-get install python3-yaml python3-nose. Alternatively, you can install these via Python's own pip package manager (should work for any OS): pip3 install -r requirements.txt.

ScratchABlock uses the PseudoC assembler as its IR. It is an assembler language expressed as much as possible using the familiar C language syntax. The idea is that any C programmer would understand it intuitively (example), but there is an ongoing effort to document PseudoC more formally.

Note that based on the requirements described in the previous section of the document, and following well-known "Unix paradigm", ScratchABlock does "one thing" - analyses and transformations on PseudoC programs, and explicitly not concerned with converting machine instructions of particular architectures into PseudoC (at least, for now). That means that ScratchABlock doesn't force you to use any particular converter/ lifter - you can use any you like. Caveat: you would need to have one to use it. See the end of the document for some hints in that regard.

Source code and interfacing scripts are in the root of the repository. The most important scripts are:

Other subdirectories of the repository:

The current approach of ScratchABlock is to grow a collection of relatively loosely-coupled algorithms ("passes") for program analysis and transformation, have them covered with tests, and allow easy user access to them. The magic of decompilation consists of applying these algorithms in the rights order and right number of times. Then, to improve the performance of decompilation, these passes usually require more tight coupling. Exploring those directions is the next priority after implementing the inventory of passes as described above.

Algorithms and transformations implemented by ScratchABlock:

ScratchABlock's partner tool is ScratchABit, which is an interactive disassemler intended to perform the lowest-level tasks of decompilation process, like separation of code from data, and identifying function boundaries. ScratchABit usually works with a native architecture assembler syntax, but for some architectures (usually, faithful RISCs), if a suitable plugin is available, it can output a PseudoC syntax, which can serve as input to ScratchABlock.