Dealing with low-level conditional expressions

maximumspatium commented 6 years ago

Hello,

I skimmed over ScratchABlock's tests in order to learn the syntax of its PseudoC IR. I noticed that the conditional statements used there look like high-level ones (that is, there is no low-level condition registers or flags).

Usually, when coming from assembly, low-level conditions and branches are still present. They need to be transformed into corresponding conditional expressions. I wonder where and how do you perform such a transformation in your decompiler.

The main issue with low-level conditions and branches is that they are machine-dependent. It should be theoretically possible to obtain higher-level conditional expressions during the ASM->IR conversion but it comes with its own set of problems. The PowerPC architecture, for example, defines eight conditional registers that can be used to store conditional expressions across the procedure, so eliminating them would require to perform a non-trivial dataflow analysis in the front-end.

Existing decompilers use different strategies for dealing with this issue. The basic idea is to abstract away low-level condition suppliers and consumers. Boomerang, for example, utilizes several complex macros like SUBFLAGS, ADDFLAGS etc. in its RTL-like IR for describing the layout and behavior of processor-specific flags and conditions. They will be transformed to higher-level conditional expressions at later stages so the initial IR still contains low-level conditionals and branches.

As opposite, Hexrays decompiler seems to operate on a processor-independent, very low-level IR called microinstruction language that will be stepwise transformed to a higher-level IR. Each machine instruction that affects flags is split into several instructions setting each low-level flag separately like this:

setz EBX,,ZF
sets EBX,,SF

Unused conditional microinstructions will be eliminated during the optimization by means of DCE, all others will be combined with the corresponding branches to form conditional expressions. The above mentioned representation taken from earlier tutorials looks too x86-centric. I wonder which microcode Hexrays uses to decompile PowerPC targets.

To make matter even worse, x86 and PowerPC architectures are very different in regards to signed/unsigned comparisons. A x86 CPU recognizes the signedness during the branches by examining the combination of flags; PowerPC implements two different comparison instructions affecting the same flags differently for signed and unsigned operands - the signedness cannot be deduced from low-level conditions alone.

Is ScratchABlock capable of dealing with machine-dependent conditionals (that is, transforming low-level concepts of flags and branches into the higher-level expressions like if (cond) goto label)? If yes, how is it implemented?

Thank you in advance! Cheers Max

pfalcon commented 6 years ago

Thanks for your interest and detailed write-up. As a quick response, did you have a chance to look thru doc for PseudoC: https://github.com/pfalcon/ScratchABlock/blob/master/docs/PseudoC-spec.md ? It would have some hints how that's supposed to be dealt with. But indeed, that matter should be covered more explicitly, let me do that first, then we can discuss it further if you're interested.

pfalcon commented 6 years ago

Ok, here we go: https://github.com/pfalcon/ScratchABlock/blob/master/docs/PseudoC-spec.md#conditional-flags

In short, it's

on a processor-independent, very low-level IR

because that's the only way to do it.

However, there's also an idea to allow to hide verboseness behind

several complex macros like SUBFLAGS, ADDFLAGS etc.

The only correction I'd give to your quotes is that I don't think that the features above are "very low-level" or "complex". The only epithet I'd use is "verbose", and it applies not to the IR, but to the original CPU's assembly. If you have a true RISC CPU (no conditional codes, etc.), with high-level instructions, you have almost one-to-one correspondence between an assembly instruction and a PseudoC statement. If you have un-true RISC (e.g. conditional codes), it gets more verbose. With a CISC like x86, it can get really hairy ;-).

pfalcon commented 6 years ago

And here goes an example of it in action: 2b1d073c7103062805c828a578c3834daaa24fdd

maximumspatium commented 6 years ago

Thank you for that very succinct and straightforward summary of the issue! IIRC, this topic hasn't been covered much in the decompilation literature.

With that true RISC you meant RISC-V, right?

BTW, what is the current state of the ScratchABlock development? What is missing for decompiling real-world targets (except the ASM->PseudoC conversion)?

pfalcon commented 6 years ago

With that true RISC you meant RISC-V, right?

No, I mean architectures faithful to the original RISC ideas of having both simple concept and implementation of CPU (e.g. avoiding common implementation bottlenecks like conditional flag registers). MIPS is a good example. Xtensa, with which I'm working, is even better, because it doesn't have human-unfriendly delay slots.

BTW, what is the current state of the ScratchABlock development? What is missing for decompiling real-world targets (except the ASM->PseudoC conversion)?

Well, README is faithfully describes the current situation: ScratchABlock is currently "a collection of relatively loosely-coupled algorithms for program analysis and transformation".

And decompilation is complex, interdependent system. Whenever you try to use it, you always find something missing. Some things are relatively simple "last mile" problems, but there're so many of them, that it's quite boring to resolve them. Others are quite complex, and true, high-quality decompilation isn't achievable without them.

So, it's that "alice in wonderland" situation that if you run fast (long in this case), it's not enough, you always should have run twice long instead.

And that's actually the current scope of work for SAB - put aside perfectionism and leverage what it already can do (go thru myriad of lower-hanging fruits).

E.g., start to add actual decompilation tests to show off what it can do: https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/cond-flags1.lst.exp.clean

pfalcon commented 6 years ago

Here's a recent example I'm working on (there was a bunch of such examples over time, but this time, I'm finally turning this into a test). ScratchABlock decompiles following PseudoC:

400047f0 Cache_Read_Disable:
400047f0     $a3 = 0x3feffe00
400047f3     memw()
400047f6     $a2 = *(u32*)($a3 + 0x20c)
400047f9     $a4 = 0xeff
400047fc     if (($a2 & BIT(8)) == 0) goto loc_40004817
400047ff loc_400047ff:
400047ff     memw()
40004802     $a6 = *(u32*)($a3 + 0x20c)
40004805     $a6 &= $a4
40004808     memw()
4000480b     *(u32*)($a3 + 0x20c) = $a6
4000480e     memw()
40004811     $a5 = *(u32*)($a3 + 0x20c)
40004814     if (($a5 & BIT(8)) != 0) goto loc_400047ff
40004817 loc_40004817:
40004817     $a5 = 0xfffdffff
4000481a     $a7 = -0x2
4000481c     $a10 = 0x1
4000481e     $a4 = 0x60000200
40004821     memw()
40004824     $a2 = *(u32*)($a4 + 0x8)
40004826     $a2 &= $a5
40004829     memw()
4000482c     *(u32*)($a4 + 0x8) = $a2
4000482e     memw()
40004831     $a11 = *(u32*)($a3 + 0x20c)
40004834     $a11 &= $a7
40004837     memw()
4000483a     *(u32*)($a3 + 0x20c) = $a11
4000483d     memw()
40004840     $a9 = *(u32*)($a3 + 0x20c)
40004843     $a9 |= $a10
40004846     memw()
40004849     *(u32*)($a3 + 0x20c) = $a9
4000484c     memw()
4000484f     $a8 = *(u32*)($a3 + 0x20c)
40004852     if (($a8 & BIT(1)) != 0) goto loc_40004860
40004855 loc_40004855:
40004855     memw()
40004858     $a6 = *(u32*)($a3 + 0x20c)
4000485b     nop()
4000485d     if (($a6 & BIT(1)) == 0) goto loc_40004855
40004860 loc_40004860:
40004860     memw()
40004863     $a8 = *(u32*)($a3 + 0x20c)
40004866     $a8 &= $a7
40004869     memw()
4000486c     *(u32*)($a3 + 0x20c) = $a8
4000486f     return

into:

void Cache_Read_Disable()
{
  while ((*(u32*)0x3ff0000c & BIT(8)) != 0) {
    *(u32*)0x3ff0000c &= 0xeff;
  }
  *(u32*)0x60000208 &= 0xfffdffff;
  *(u32*)0x3ff0000c &= -0x2;
  *(u32*)0x3ff0000c |= 0x1;
  while ((*(u32*)0x3ff0000c & BIT(1)) == 0) {
    nop();
  }
  *(u32*)0x3ff0000c &= -0x2;
}

Of course, various improvements can be called for, but the point, there's already a bunch of AI packed into it, that answering a question "Is this decompilation correct?" may require some head-scratching from a human ;-).

maximumspatium commented 6 years ago

I just quickly hacked a PowerPC->PseudoC converter based on my private decompilation project.

Below is the converted input (sub_205fac.lst):

sub_205FAC:
$r5 = *(u32*)($r2 - 0x158)
$r10 = *(u32*)$r5
$cr0.eq = (i32)$r10 == (i32)0x0
$cr0.lt = (i32)$r10 < (i32)0x0
$cr0.gt = (i32)$r10 > (i32)0x0
if ($cr0.eq) goto loc_20600C

loc_205FBC:
$r8 = *(u32*)($r10 + 0x10)
$r9 = 0x0
$cr0.eq = (u32)$r8 == (u32)0x0
$cr0.lt = (u32)$r8 < (u32)0x0
$cr0.gt = (u32)$r8 > (u32)0x0
if (!$cr0.gt) goto loc_206000

loc_205FCC:
$r11 = $r10 + 0x58
$r12 = $r10 + 0x34

loc_205FD4:
$ea = ($r12 + 0x3c)
$r5 = *(u32*)$ea
$r12 = $ea
$cr0.eq = (u32)$r5 == (u32)$r3
$cr0.lt = (u32)$r5 < (u32)$r3
$cr0.gt = (u32)$r5 > (u32)$r3
if (!$cr0.eq) goto loc_205FF0

loc_205FE0:
$r5 = *(u32*)($r11 + 0x4)
$r6 = $r4 | $r5
*(u32*)($r11 + 0x4) = $r6
$r8 = *(u32*)($r10 + 0x10)

loc_205FF0:
$r9 = $r9 + 0x1
$r11 = $r11 + 0x3c
$cr0.eq = (u32)$r9 == (u32)$r8
$cr0.lt = (u32)$r9 < (u32)$r8
$cr0.gt = (u32)$r9 > (u32)$r8
if ($cr0.lt) goto loc_205FD4

loc_206000:
$r10 = *(u32*)$r10
$cr0.eq = (i32)$r10 == (i32)0x0
$cr0.lt = (i32)$r10 < (i32)0x0
$cr0.gt = (i32)$r10 > (i32)0x0
if (!$cr0.eq) goto loc_205FBC

loc_20600C:
return

Here is the SAB output for python apply_xform.py sub_205fac.lst --script script_decompile --no-dead --no-comments --format=c:

// Estimated params: [$r2, $r3, $r4]
void sub_205FAC()
{
  $r10 = *(u32*)*(u32*)($r2 - 0x158);
  if ((i32)*(u32*)*(u32*)($r2 - 0x158) == 0x0) goto l_EXIT_;

l9:
  $r9 = 0x0;
  if (!((u32)*(u32*)($r10 + 0x10) > 0x0)) goto l44;
  $r11 = $r10 + 0x58;
  $r12 = $r10 + 0x34;

l21:
  $r5 = *(u32*)($r12 + 0x3c);
  $r12 += 0x3c;
  if ((u32)$r5 == (u32)$r3) {
    *(u32*)($r11 + 0x4) = $r4 | *(u32*)($r11 + 0x4);
  }
  $r9 += 0x1;
  $r11 += 0x3c;
  if ((u32)$r9 < (u32)*(u32*)($r10 + 0x10)) goto l21;

l44:
  $r10 = *(u32*)$r10;
  if (!((i32)$r10 == 0x0)) goto l9;

l_EXIT_:
}

The C output looks quite promising! The following works as expected:

parameter estimation (one implicit param [r2=RTOC] and two explicit ones [r3 and r4])
condition propagation
DCE

The following doesn't seem to work:

condition inversion !($r10 == 0) --> $r10 != 0
type inference
loop structuring

Nice to have working as well:

refactoring of induction variables like $r11 += 0x3c --> $r11 = $r9 * 0x3c

Do you need a helping hand?

pfalcon commented 6 years ago

The C output looks quite promising!

Your original post contained decompilation output like $r12 + 0x3c = $r10 + 0x70, and I noted to myself that you must be very kind to call such output "promising" ;-). I see that the issue was in the input PseudoC (but shows that many things aren't checked).

condition inversion !($r10 == 0) --> $r10 != 0

Hmm, weird that I never faced a need for that during expression simplification. Otherwise, the code is there (COND.neg()), used e.g. for if structuring. Now with the testcase, I'll look into implementing it. (Or maybe if you're serious about playing with SABl, you'll be interested to look into it yourself, I'd say it should be doable in terms of xform_expr_infer.py).

type inference

Currently, SABl doesn't do any type analysis at all, and that's one of the big, complex areas without which there won't be quality decompilation.

loop structuring

Yep, no idea why it didn't make at least a do-while at l21. Will look into it too. I played with structuring "non-structured" loops (with continue/break) once too, but it's far from being complete/tested.

refactoring of induction variables like $r11 += 0x3c --> $r11 = $r9 * 0x3c

Whoa, I'm not even sure I'd like to see such processing in the output. It would require Value Set Analysis.

Do you need a helping hand?

Well, with the usual warning that decompilation is a) utterly complex; b) very non-rewarding (you probably won't get what you want anytime soon), if you want to play with it beyond what you already did (which is already quite helpful, thanks!), you're more than welcome, and I'll try to help with any questions/issues as I can. One big thing needed is testing (i.e. manual) and tests (i.e. capturing manual things to be parts of the testsuite).

maximumspatium commented 6 years ago

Your original post contained decompilation output like $r12 + 0x3c = $r10 + 0x70, and I noted to myself that you must be very kind to call such output "promising" ;-). I see that the issue was in the input PseudoC (but shows that many things aren't checked).

Yes, it was a bug in my converter causing that weird statement $r12 + 0x3c = $r10 + 0x70 to appear. I've fixed the converter and the output so everything works as expected now.

maximumspatium commented 6 years ago

refactoring of induction variables like $r11 += 0x3c --> $r11 = $r9 * 0x3c Whoa, I'm not even sure I'd like to see such processing in the output. It would require Value Set Analysis.

Well, I remember having seen an algorithm for optimizing induction variables in a compiler book. I think it was Muchnick's great book "Advanced compiler design and implementation". I saw a lot of induction variables in the real life examples during the two decades of my RE experience, hence the request. For the time being, such an optimization can be performed manually.

pfalcon commented 6 years ago

Yep, no idea why it didn't make at least a do-while at l21.

You must be using an older checkout. Please be sure to pull frequently, as I'm landing pretty big changes these days (well, was landing over Xmas holidays, maybe will do more at NY time).

With current version it finds if and while easily, but there's an issue due to a critical edge with the loop - need to think what to do about them, either handle as is or split.

maximumspatium commented 6 years ago

You must be using an older checkout. Please be sure to pull frequently, as I'm landing pretty big changes these days (well, was landing over Xmas holidays, maybe will do more at NY time).

I'm sorry, I always use the latest changes. I've just deleted the whole local repository and cloned SABl again from Github to avoid problems. I'm still getting the same result (no while so far).

maximumspatium commented 6 years ago

With current version it findsif and while easily

Is decomp.py/match_while() supposed to be used for loop recognition? If yes, it should be at least called from script_decompile.py/structure(). For the moment being, it's not the case:

def structure(cfg):
    apply_iterative(match_seq, (cfg,))
    apply_iterative(match_if, (cfg,))
    apply_iterative(match_ifelse, (cfg,))
    apply_iterative(match_seq, (cfg,))
    apply_iterative(match_ifelse, (cfg,))
    apply_iterative(match_if_else_inv_ladder, (cfg,))
    apply_iterative(match_if_else_ladder, (cfg,))

The code above matches only ifs and seqs so far...

pfalcon commented 6 years ago

Ack, sorry - I have some local changes not pushed yet, because they need a) additional manual testing; b) automated tests. And well, your example actually triggers issue with my latest changes - the problem seems to be not in "critical edge", but in wrong pass being applied which trigger its appearance. Anyway, I pushed script_decompile changes I had, I'll look into the issue later.

maximumspatium commented 6 years ago

After pulling your latest changes I'm getting NameError: name 'match_if_dowhile' is not defined

pfalcon commented 6 years ago

Pushed that pass.

pfalcon commented 6 years ago

And a question - would you be ok if the case above is added as a test to SABl repo?

maximumspatium commented 6 years ago

Pushed that pass.

Thanks! Does it work for you? I'm now getting the following error:

Error while processing file: sub_205fac.lst
Traceback (most recent call last):
  File "apply_xform.py", line 208, in <module>
    changed = one_iter(input, output)
  File "apply_xform.py", line 176, in one_iter
    handle_file(args)
  File "apply_xform.py", line 58, in handle_file
    raise e
  File "apply_xform.py", line 55, in handle_file
    handle_file_unprotected(args)
  File "apply_xform.py", line 83, in handle_file_unprotected
    mod.apply(cfg)
  File "/Users/****/Development/ScratchABlock/script_decompile.py", line 79, in apply
    if match_abnormal_sel(cfg):
  File "/Users/****/Development/ScratchABlock/decomp.py", line 413, in match_abnormal_sel
    for v, _ in cfg.iter_rev_postorder():
  File "/Users/****/Development/ScratchABlock/graph.py", line 241, in iter_rev_postorder
    return sorted(self._nodes.items(), key=lambda x: -x[1]["dfsno"])
  File "/Users/****/Documents/Development/ScratchABlock/graph.py", line 241, in <lambda>
    return sorted(self._nodes.items(), key=lambda x: -x[1]["dfsno"])
TypeError: bad operand type for unary -: 'NoneType'

And a question - would you be ok if the case above is added as a test to SABl repo?

Yes, sure.

pfalcon commented 6 years ago

TypeError: bad operand type for unary -: 'NoneType'

Yes, that's the issue I'm talking about in https://github.com/pfalcon/ScratchABlock/issues/7#issuecomment-354016414 and following comments. Hope to look into it later today.

Yes, sure.

Thanks!

pfalcon commented 6 years ago

Ok, here we go: https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/sub_5fac/sub_5fac.lst.exp.clean#L1

Now that's where I start to feel satisfaction of the work done and feel the 2.5 years of development start to pay off. You see, not only it completely structured it, it even applied that latest match_if_dowhile pass! (If only it was applied correctly, and that's again the reason why I don't haste with pushing WIP code to master - I'd like to be sure that what in master was reasonably tested).

The biggest comment of the recent testcases I looked at is that data-dependency graph guided code motion would be a pretty big readability win. Consider for example https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/sub_5fac/sub_5fac.lst.exp.clean#L12 . If that line moved to where all increments happen, $r5's value could be propagated and it itself DCE'ed.

The rest of the goofs in the docompilation are "trivialities" (ahem). For example, it's clear there's superfluous propagations, commutative ops should be ordered better, etc.

pfalcon commented 6 years ago

match_if_dowhile pass ... If only it was applied correctly

And it's not, all due to superfluous propagation. And I actually have no idea why it applied there under those circumstances at all. Ah, missing if. Fun.

pfalcon commented 6 years ago

If that line moved to where all increments happen, $r5's value could be propagated and it itself DCE'ed.

Of course, with SSA, that would come "for free" without code motion. Or maybe it wouldn't "for free", because artifact variables would be introduced, depending on the quality of out-of-SSA conversion, and I saw a lot of low-quality ones.

And the point, for human-readability, increments really should be grouped together at the end ;-).

maximumspatium commented 6 years ago

First of all, congratulations! Good job!

Yes, the superfluous propagation need to be fixed because it leads to wrong code:

$r10 = *(u32*)*(u32*)($r2_0 - 0x158);
while ((i32)*(u32*)*(u32*)($r2_0 - 0x158) != 0x0) {
  ...
  $r10 = *(u32*)$r10;
}

It's clear that the code processes a linked list. The expression in $r10 may not be propagated in this case. The correct output is:

$r10 = *(u32*)*(u32*)($r2_0 - 0x158);
while ((i32)*$r10 != 0x0) {
  ...
  $r10 = *(u32*)$r10;
}

maximumspatium commented 6 years ago

Of course, with SSA, that would come "for free" without code motion. Or maybe it wouldn't "for free", because artifact variables would be introduced, depending on the quality of out-of-SSA conversion, and I saw a lot of low-quality ones.

I'm not sure SSA will help here. A more sophisticated analysis and transformation is required to handle induction variables like that. There are actually three induction variables in this code:

$r9 --> loop counter, basic induction variable $r11 --> structure pointer, member 1, basic induction variable $r12 --> struture pointer, member 2, basic induction variable

BTW, I found the algorithm for finding induction variables in Muchnick's book. It's quite easy to implement. It will spit out a list of recognized induction variables and their linear equations.

The question is how to transform them to a better readable form? The answer could be provided by type inference and pointer analysis. The problem is that those are horribly hard to implement. We could consider a lightweight version (heuristics) for the start.

pfalcon commented 6 years ago

Yes, the superfluous propagation need to be fixed because it leads to wrong code:

Yeah, the wrong code is actually because match_if_dowhile didn't check the conditions match. That was fixed in 38d9d89bb02d1bf33f6304835bd588303a9ce17b. So, the older "while" output can be considered a sneak preview of what it can do ;-) (when expr propagation is fixed).

pfalcon commented 6 years ago

@maximumspatium : Can this ticket be closed now? For interesting ideas, like dealing with induction variables, could you open separate ticket(s)?

P.S. Still interested in ScratchABlock? ;-)

maximumspatium commented 6 years ago

Can this ticket be closed now?

Ok, if you like.

For interesting ideas, like dealing with induction variables, could you open separate ticket(s)?

Yes, sure.

Still interested in ScratchABlock? ;-)

Yes, I was so silent because I was busy trying out many different things, among others switch statements structuring, rewriting of calls and structures etc. None of them is worth being submitted yet but I'll eventually get there someday...

pfalcon commented 6 years ago

Ok, if you like.

Well, I'm generally ok to keep tickets open for months, but if there're too much different content, later it becomes only harder to salvage additional data from them.

None of them is worth being submitted yet but I'll eventually get there someday...

Cool, thanks! Just a note based on our previous discussion of development process - if you have any simple and standalone changes, like docstrings, etc. - feel free to submit them, because otherwise it's easy to accumulate a lot local changes and then again it's later hard to process it. As I mentioned, that's pretty much what I had (glad to have cleared up bunch of that over holidays). Anyway, just a generic observation. Certainly, please do it at your pace and convenience.

maximumspatium commented 6 years ago

For interesting ideas, like dealing with induction variables, could you open separate ticket(s)?

See #24.

I'm going to close this issue because the original problem is solved.

pfalcon commented 6 years ago

Thanks!

pfalcon / ScratchABlock

Dealing with low-level conditional expressions #7