pfalcon / ScratchABlock

Yet another crippled decompiler project
https://github.com/EiNSTeiN-/decompiler/issues/9#issuecomment-103221200
GNU General Public License v3.0
104 stars 23 forks source link

Rewriting of induction variables #24

Open maximumspatium opened 6 years ago

maximumspatium commented 6 years ago

Induction variables are widely emitted by compilers during program optimization. They greatly assist in strength reduction.

Consider the following code:

int arrayOfInts[10] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

for (i = 0; i < 10; i++)
    printf("Next int %d", arrayOfInts[i]);

One possible transformation of it could look like that:

int *p = intPtr;
int arrayOfInts[10] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

p = arrayOfInts;

for (i = 0; i < 10; i++) {
    printf("Next int %d", *p);
    p++;
}

The expression arrayOfInts[i] has been transformed to the equivalent pointer dereference introducing the induction variable "p".

This is surely a very simple example. Real-world programs will usually contain more complex cases. A well-optimized PowerPC program I used to work on contained up to 4 induction variables per loop.

The ability to (partially) undo this kind of optimization in the decompiler would greatly improve the readability of its output. Such a transformation would be applied at a later decompilation stage and could be human-assisted.

The great book "Advanced compiler design and implementation" by Steven S. Muchnick already scratched an algorithm for induction variable identification and removal (see chapter 14).

A good starting point would be simplifications of pointers that depend on the loop counter.