pfalcon / ScratchABlock

Yet another crippled decompiler project
https://github.com/EiNSTeiN-/decompiler/issues/9#issuecomment-103221200
GNU General Public License v3.0
104 stars 23 forks source link

Issues with expression propagation and resolving them #21

Open pfalcon opened 6 years ago

pfalcon commented 6 years ago

The most obvious one:

  1. Complex expressions should not be propagated to multiple places, as that makes the code more complex.
pfalcon commented 6 years ago

2.

Distinguishing feature of PseudoC is that it allows complex-ish expressions in assignments, not just 3-address expressions, to model CISC and other adhoc features, e.g.:

$eax = *(u32*)($ebx + $ecx * 8 + 3)

But that also poses a problem, because SABl sees that expression as a whole, and can't propagate subexpressions of it. Sometimes, that can lead to obvious problems. For example,

$a3 = UINT64($a7, $a6) >> $SAR

would rather be:

$a7_a6 = UINT64($a7, $a6)
$a3 = (u64)$a7_a6 >> $SAR

That would make an implicit point where we get a 64-bit vreg, and that would allow to simplify expressions much better.

One can argue that this is a problem of input PseudoC, but again, it's a distinguishing feature that it allows to map a single machine instruction to a single PseudoC statement for as wide number of architectures as possible. So, instead, there should be "deconstruction" pass in SABl itself.

pfalcon commented 6 years ago

3.

Extending on that "back and forth processing" idea further. 1. says "Complex expressions should not be propagated to multiple places, as that makes the code more complex." That's of course not true. For example, suppose we have $r1 = $r0 + 1 and that can be propagated into 2 places. Should that be done? Naive answer is "no". But the answer is "yes" if one place to propagate is $r1 - 1.

It's hard to tell whether a particular propagation is useful or not. So, the only general approach is to propagate eagerly and widely, but then have a CSE pass, to undo any "useless" propagations.