Closed maximumspatium closed 6 years ago
Thanks for your interest and detailed write-up. As a quick response, did you have a chance to look thru doc for PseudoC: https://github.com/pfalcon/ScratchABlock/blob/master/docs/PseudoC-spec.md ? It would have some hints how that's supposed to be dealt with. But indeed, that matter should be covered more explicitly, let me do that first, then we can discuss it further if you're interested.
Ok, here we go: https://github.com/pfalcon/ScratchABlock/blob/master/docs/PseudoC-spec.md#conditional-flags
In short, it's
on a processor-independent, very low-level IR
because that's the only way to do it.
However, there's also an idea to allow to hide verboseness behind
several complex macros like SUBFLAGS, ADDFLAGS etc.
The only correction I'd give to your quotes is that I don't think that the features above are "very low-level" or "complex". The only epithet I'd use is "verbose", and it applies not to the IR, but to the original CPU's assembly. If you have a true RISC CPU (no conditional codes, etc.), with high-level instructions, you have almost one-to-one correspondence between an assembly instruction and a PseudoC statement. If you have un-true RISC (e.g. conditional codes), it gets more verbose. With a CISC like x86, it can get really hairy ;-).
And here goes an example of it in action: 2b1d073c7103062805c828a578c3834daaa24fdd
Thank you for that very succinct and straightforward summary of the issue! IIRC, this topic hasn't been covered much in the decompilation literature.
With that true RISC you meant RISC-V, right?
BTW, what is the current state of the ScratchABlock development? What is missing for decompiling real-world targets (except the ASM->PseudoC conversion)?
With that true RISC you meant RISC-V, right?
No, I mean architectures faithful to the original RISC ideas of having both simple concept and implementation of CPU (e.g. avoiding common implementation bottlenecks like conditional flag registers). MIPS is a good example. Xtensa, with which I'm working, is even better, because it doesn't have human-unfriendly delay slots.
BTW, what is the current state of the ScratchABlock development? What is missing for decompiling real-world targets (except the ASM->PseudoC conversion)?
Well, README is faithfully describes the current situation: ScratchABlock is currently "a collection of relatively loosely-coupled algorithms for program analysis and transformation".
And decompilation is complex, interdependent system. Whenever you try to use it, you always find something missing. Some things are relatively simple "last mile" problems, but there're so many of them, that it's quite boring to resolve them. Others are quite complex, and true, high-quality decompilation isn't achievable without them.
So, it's that "alice in wonderland" situation that if you run fast (long in this case), it's not enough, you always should have run twice long instead.
And that's actually the current scope of work for SAB - put aside perfectionism and leverage what it already can do (go thru myriad of lower-hanging fruits).
E.g., start to add actual decompilation tests to show off what it can do: https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/cond-flags1.lst.exp.clean
Here's a recent example I'm working on (there was a bunch of such examples over time, but this time, I'm finally turning this into a test). ScratchABlock decompiles following PseudoC:
400047f0 Cache_Read_Disable:
400047f0 $a3 = 0x3feffe00
400047f3 memw()
400047f6 $a2 = *(u32*)($a3 + 0x20c)
400047f9 $a4 = 0xeff
400047fc if (($a2 & BIT(8)) == 0) goto loc_40004817
400047ff loc_400047ff:
400047ff memw()
40004802 $a6 = *(u32*)($a3 + 0x20c)
40004805 $a6 &= $a4
40004808 memw()
4000480b *(u32*)($a3 + 0x20c) = $a6
4000480e memw()
40004811 $a5 = *(u32*)($a3 + 0x20c)
40004814 if (($a5 & BIT(8)) != 0) goto loc_400047ff
40004817 loc_40004817:
40004817 $a5 = 0xfffdffff
4000481a $a7 = -0x2
4000481c $a10 = 0x1
4000481e $a4 = 0x60000200
40004821 memw()
40004824 $a2 = *(u32*)($a4 + 0x8)
40004826 $a2 &= $a5
40004829 memw()
4000482c *(u32*)($a4 + 0x8) = $a2
4000482e memw()
40004831 $a11 = *(u32*)($a3 + 0x20c)
40004834 $a11 &= $a7
40004837 memw()
4000483a *(u32*)($a3 + 0x20c) = $a11
4000483d memw()
40004840 $a9 = *(u32*)($a3 + 0x20c)
40004843 $a9 |= $a10
40004846 memw()
40004849 *(u32*)($a3 + 0x20c) = $a9
4000484c memw()
4000484f $a8 = *(u32*)($a3 + 0x20c)
40004852 if (($a8 & BIT(1)) != 0) goto loc_40004860
40004855 loc_40004855:
40004855 memw()
40004858 $a6 = *(u32*)($a3 + 0x20c)
4000485b nop()
4000485d if (($a6 & BIT(1)) == 0) goto loc_40004855
40004860 loc_40004860:
40004860 memw()
40004863 $a8 = *(u32*)($a3 + 0x20c)
40004866 $a8 &= $a7
40004869 memw()
4000486c *(u32*)($a3 + 0x20c) = $a8
4000486f return
into:
void Cache_Read_Disable()
{
while ((*(u32*)0x3ff0000c & BIT(8)) != 0) {
*(u32*)0x3ff0000c &= 0xeff;
}
*(u32*)0x60000208 &= 0xfffdffff;
*(u32*)0x3ff0000c &= -0x2;
*(u32*)0x3ff0000c |= 0x1;
while ((*(u32*)0x3ff0000c & BIT(1)) == 0) {
nop();
}
*(u32*)0x3ff0000c &= -0x2;
}
Of course, various improvements can be called for, but the point, there's already a bunch of AI packed into it, that answering a question "Is this decompilation correct?" may require some head-scratching from a human ;-).
I just quickly hacked a PowerPC->PseudoC converter based on my private decompilation project.
Below is the converted input (sub_205fac.lst):
sub_205FAC:
$r5 = *(u32*)($r2 - 0x158)
$r10 = *(u32*)$r5
$cr0.eq = (i32)$r10 == (i32)0x0
$cr0.lt = (i32)$r10 < (i32)0x0
$cr0.gt = (i32)$r10 > (i32)0x0
if ($cr0.eq) goto loc_20600C
loc_205FBC:
$r8 = *(u32*)($r10 + 0x10)
$r9 = 0x0
$cr0.eq = (u32)$r8 == (u32)0x0
$cr0.lt = (u32)$r8 < (u32)0x0
$cr0.gt = (u32)$r8 > (u32)0x0
if (!$cr0.gt) goto loc_206000
loc_205FCC:
$r11 = $r10 + 0x58
$r12 = $r10 + 0x34
loc_205FD4:
$ea = ($r12 + 0x3c)
$r5 = *(u32*)$ea
$r12 = $ea
$cr0.eq = (u32)$r5 == (u32)$r3
$cr0.lt = (u32)$r5 < (u32)$r3
$cr0.gt = (u32)$r5 > (u32)$r3
if (!$cr0.eq) goto loc_205FF0
loc_205FE0:
$r5 = *(u32*)($r11 + 0x4)
$r6 = $r4 | $r5
*(u32*)($r11 + 0x4) = $r6
$r8 = *(u32*)($r10 + 0x10)
loc_205FF0:
$r9 = $r9 + 0x1
$r11 = $r11 + 0x3c
$cr0.eq = (u32)$r9 == (u32)$r8
$cr0.lt = (u32)$r9 < (u32)$r8
$cr0.gt = (u32)$r9 > (u32)$r8
if ($cr0.lt) goto loc_205FD4
loc_206000:
$r10 = *(u32*)$r10
$cr0.eq = (i32)$r10 == (i32)0x0
$cr0.lt = (i32)$r10 < (i32)0x0
$cr0.gt = (i32)$r10 > (i32)0x0
if (!$cr0.eq) goto loc_205FBC
loc_20600C:
return
Here is the SAB output for python apply_xform.py sub_205fac.lst --script script_decompile --no-dead --no-comments --format=c
:
// Estimated params: [$r2, $r3, $r4]
void sub_205FAC()
{
$r10 = *(u32*)*(u32*)($r2 - 0x158);
if ((i32)*(u32*)*(u32*)($r2 - 0x158) == 0x0) goto l_EXIT_;
l9:
$r9 = 0x0;
if (!((u32)*(u32*)($r10 + 0x10) > 0x0)) goto l44;
$r11 = $r10 + 0x58;
$r12 = $r10 + 0x34;
l21:
$r5 = *(u32*)($r12 + 0x3c);
$r12 += 0x3c;
if ((u32)$r5 == (u32)$r3) {
*(u32*)($r11 + 0x4) = $r4 | *(u32*)($r11 + 0x4);
}
$r9 += 0x1;
$r11 += 0x3c;
if ((u32)$r9 < (u32)*(u32*)($r10 + 0x10)) goto l21;
l44:
$r10 = *(u32*)$r10;
if (!((i32)$r10 == 0x0)) goto l9;
l_EXIT_:
}
The C output looks quite promising! The following works as expected:
The following doesn't seem to work:
!($r10 == 0) --> $r10 != 0
Nice to have working as well:
$r11 += 0x3c --> $r11 = $r9 * 0x3c
Do you need a helping hand?
The C output looks quite promising!
Your original post contained decompilation output like $r12 + 0x3c = $r10 + 0x70
, and I noted to myself that you must be very kind to call such output "promising" ;-). I see that the issue was in the input PseudoC (but shows that many things aren't checked).
condition inversion !($r10 == 0) --> $r10 != 0
Hmm, weird that I never faced a need for that during expression simplification. Otherwise, the code is there (COND.neg()), used e.g. for if
structuring. Now with the testcase, I'll look into implementing it. (Or maybe if you're serious about playing with SABl, you'll be interested to look into it yourself, I'd say it should be doable in terms of xform_expr_infer.py).
type inference
Currently, SABl doesn't do any type analysis at all, and that's one of the big, complex areas without which there won't be quality decompilation.
loop structuring
Yep, no idea why it didn't make at least a do-while at l21. Will look into it too. I played with structuring "non-structured" loops (with continue/break) once too, but it's far from being complete/tested.
refactoring of induction variables like $r11 += 0x3c --> $r11 = $r9 * 0x3c
Whoa, I'm not even sure I'd like to see such processing in the output. It would require Value Set Analysis.
Do you need a helping hand?
Well, with the usual warning that decompilation is a) utterly complex; b) very non-rewarding (you probably won't get what you want anytime soon), if you want to play with it beyond what you already did (which is already quite helpful, thanks!), you're more than welcome, and I'll try to help with any questions/issues as I can. One big thing needed is testing (i.e. manual) and tests (i.e. capturing manual things to be parts of the testsuite).
Your original post contained decompilation output like $r12 + 0x3c = $r10 + 0x70, and I noted to myself that you must be very kind to call such output "promising" ;-). I see that the issue was in the input PseudoC (but shows that many things aren't checked).
Yes, it was a bug in my converter causing that weird statement $r12 + 0x3c = $r10 + 0x70
to appear. I've fixed the converter and the output so everything works as expected now.
refactoring of induction variables like $r11 += 0x3c --> $r11 = $r9 * 0x3c Whoa, I'm not even sure I'd like to see such processing in the output. It would require Value Set Analysis.
Well, I remember having seen an algorithm for optimizing induction variables in a compiler book. I think it was Muchnick's great book "Advanced compiler design and implementation". I saw a lot of induction variables in the real life examples during the two decades of my RE experience, hence the request. For the time being, such an optimization can be performed manually.
Yep, no idea why it didn't make at least a do-while at l21.
You must be using an older checkout. Please be sure to pull frequently, as I'm landing pretty big changes these days (well, was landing over Xmas holidays, maybe will do more at NY time).
With current version it finds if
and while
easily, but there's an issue due to a critical edge with the loop - need to think what to do about them, either handle as is or split.
You must be using an older checkout. Please be sure to pull frequently, as I'm landing pretty big changes these days (well, was landing over Xmas holidays, maybe will do more at NY time).
I'm sorry, I always use the latest changes. I've just deleted the whole local repository and cloned SABl again from Github to avoid problems. I'm still getting the same result (no while
so far).
With current version it finds
if
andwhile
easily
Is decomp.py/match_while()
supposed to be used for loop recognition? If yes, it should be at least called from script_decompile.py/structure(). For the moment being, it's not the case:
def structure(cfg):
apply_iterative(match_seq, (cfg,))
apply_iterative(match_if, (cfg,))
apply_iterative(match_ifelse, (cfg,))
apply_iterative(match_seq, (cfg,))
apply_iterative(match_ifelse, (cfg,))
apply_iterative(match_if_else_inv_ladder, (cfg,))
apply_iterative(match_if_else_ladder, (cfg,))
The code above matches only ifs
and seqs
so far...
Ack, sorry - I have some local changes not pushed yet, because they need a) additional manual testing; b) automated tests. And well, your example actually triggers issue with my latest changes - the problem seems to be not in "critical edge", but in wrong pass being applied which trigger its appearance. Anyway, I pushed script_decompile changes I had, I'll look into the issue later.
After pulling your latest changes I'm getting
NameError: name 'match_if_dowhile' is not defined
Pushed that pass.
And a question - would you be ok if the case above is added as a test to SABl repo?
Pushed that pass.
Thanks! Does it work for you? I'm now getting the following error:
Error while processing file: sub_205fac.lst
Traceback (most recent call last):
File "apply_xform.py", line 208, in <module>
changed = one_iter(input, output)
File "apply_xform.py", line 176, in one_iter
handle_file(args)
File "apply_xform.py", line 58, in handle_file
raise e
File "apply_xform.py", line 55, in handle_file
handle_file_unprotected(args)
File "apply_xform.py", line 83, in handle_file_unprotected
mod.apply(cfg)
File "/Users/****/Development/ScratchABlock/script_decompile.py", line 79, in apply
if match_abnormal_sel(cfg):
File "/Users/****/Development/ScratchABlock/decomp.py", line 413, in match_abnormal_sel
for v, _ in cfg.iter_rev_postorder():
File "/Users/****/Development/ScratchABlock/graph.py", line 241, in iter_rev_postorder
return sorted(self._nodes.items(), key=lambda x: -x[1]["dfsno"])
File "/Users/****/Documents/Development/ScratchABlock/graph.py", line 241, in <lambda>
return sorted(self._nodes.items(), key=lambda x: -x[1]["dfsno"])
TypeError: bad operand type for unary -: 'NoneType'
And a question - would you be ok if the case above is added as a test to SABl repo?
Yes, sure.
TypeError: bad operand type for unary -: 'NoneType'
Yes, that's the issue I'm talking about in https://github.com/pfalcon/ScratchABlock/issues/7#issuecomment-354016414 and following comments. Hope to look into it later today.
Yes, sure.
Thanks!
Ok, here we go: https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/sub_5fac/sub_5fac.lst.exp.clean#L1
Now that's where I start to feel satisfaction of the work done and feel the 2.5 years of development start to pay off. You see, not only it completely structured it, it even applied that latest match_if_dowhile pass! (If only it was applied correctly, and that's again the reason why I don't haste with pushing WIP code to master - I'd like to be sure that what in master was reasonably tested).
The biggest comment of the recent testcases I looked at is that data-dependency graph guided code motion would be a pretty big readability win. Consider for example https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/sub_5fac/sub_5fac.lst.exp.clean#L12 . If that line moved to where all increments happen, $r5's value could be propagated and it itself DCE'ed.
The rest of the goofs in the docompilation are "trivialities" (ahem). For example, it's clear there's superfluous propagations, commutative ops should be ordered better, etc.
match_if_dowhile pass ... If only it was applied correctly
And it's not, all due to superfluous propagation. And I actually have no idea why it applied there under those circumstances at all. Ah, missing if. Fun.
If that line moved to where all increments happen, $r5's value could be propagated and it itself DCE'ed.
Of course, with SSA, that would come "for free" without code motion. Or maybe it wouldn't "for free", because artifact variables would be introduced, depending on the quality of out-of-SSA conversion, and I saw a lot of low-quality ones.
And the point, for human-readability, increments really should be grouped together at the end ;-).
First of all, congratulations! Good job!
Yes, the superfluous propagation need to be fixed because it leads to wrong code:
$r10 = *(u32*)*(u32*)($r2_0 - 0x158);
while ((i32)*(u32*)*(u32*)($r2_0 - 0x158) != 0x0) {
...
$r10 = *(u32*)$r10;
}
It's clear that the code processes a linked list. The expression in $r10 may not be propagated in this case. The correct output is:
$r10 = *(u32*)*(u32*)($r2_0 - 0x158);
while ((i32)*$r10 != 0x0) {
...
$r10 = *(u32*)$r10;
}
Of course, with SSA, that would come "for free" without code motion. Or maybe it wouldn't "for free", because artifact variables would be introduced, depending on the quality of out-of-SSA conversion, and I saw a lot of low-quality ones.
I'm not sure SSA will help here. A more sophisticated analysis and transformation is required to handle induction variables like that. There are actually three induction variables in this code:
$r9 --> loop counter, basic induction variable $r11 --> structure pointer, member 1, basic induction variable $r12 --> struture pointer, member 2, basic induction variable
BTW, I found the algorithm for finding induction variables in Muchnick's book. It's quite easy to implement. It will spit out a list of recognized induction variables and their linear equations.
The question is how to transform them to a better readable form? The answer could be provided by type inference and pointer analysis. The problem is that those are horribly hard to implement. We could consider a lightweight version (heuristics) for the start.
Yes, the superfluous propagation need to be fixed because it leads to wrong code:
Yeah, the wrong code is actually because match_if_dowhile didn't check the conditions match. That was fixed in 38d9d89bb02d1bf33f6304835bd588303a9ce17b. So, the older "while" output can be considered a sneak preview of what it can do ;-) (when expr propagation is fixed).
@maximumspatium : Can this ticket be closed now? For interesting ideas, like dealing with induction variables, could you open separate ticket(s)?
P.S. Still interested in ScratchABlock? ;-)
Can this ticket be closed now?
Ok, if you like.
For interesting ideas, like dealing with induction variables, could you open separate ticket(s)?
Yes, sure.
Still interested in ScratchABlock? ;-)
Yes, I was so silent because I was busy trying out many different things, among others switch statements structuring, rewriting of calls and structures etc. None of them is worth being submitted yet but I'll eventually get there someday...
Ok, if you like.
Well, I'm generally ok to keep tickets open for months, but if there're too much different content, later it becomes only harder to salvage additional data from them.
None of them is worth being submitted yet but I'll eventually get there someday...
Cool, thanks! Just a note based on our previous discussion of development process - if you have any simple and standalone changes, like docstrings, etc. - feel free to submit them, because otherwise it's easy to accumulate a lot local changes and then again it's later hard to process it. As I mentioned, that's pretty much what I had (glad to have cleared up bunch of that over holidays). Anyway, just a generic observation. Certainly, please do it at your pace and convenience.
For interesting ideas, like dealing with induction variables, could you open separate ticket(s)?
See #24.
I'm going to close this issue because the original problem is solved.
Thanks!
Hello,
I skimmed over ScratchABlock's tests in order to learn the syntax of its PseudoC IR. I noticed that the conditional statements used there look like high-level ones (that is, there is no low-level condition registers or flags).
Usually, when coming from assembly, low-level conditions and branches are still present. They need to be transformed into corresponding conditional expressions. I wonder where and how do you perform such a transformation in your decompiler.
The main issue with low-level conditions and branches is that they are machine-dependent. It should be theoretically possible to obtain higher-level conditional expressions during the ASM->IR conversion but it comes with its own set of problems. The PowerPC architecture, for example, defines eight conditional registers that can be used to store conditional expressions across the procedure, so eliminating them would require to perform a non-trivial dataflow analysis in the front-end.
Existing decompilers use different strategies for dealing with this issue. The basic idea is to abstract away low-level condition suppliers and consumers. Boomerang, for example, utilizes several complex macros like SUBFLAGS, ADDFLAGS etc. in its RTL-like IR for describing the layout and behavior of processor-specific flags and conditions. They will be transformed to higher-level conditional expressions at later stages so the initial IR still contains low-level conditionals and branches.
As opposite, Hexrays decompiler seems to operate on a processor-independent, very low-level IR called microinstruction language that will be stepwise transformed to a higher-level IR. Each machine instruction that affects flags is split into several instructions setting each low-level flag separately like this:
Unused conditional microinstructions will be eliminated during the optimization by means of DCE, all others will be combined with the corresponding branches to form conditional expressions. The above mentioned representation taken from earlier tutorials looks too x86-centric. I wonder which microcode Hexrays uses to decompile PowerPC targets.
To make matter even worse, x86 and PowerPC architectures are very different in regards to signed/unsigned comparisons. A x86 CPU recognizes the signedness during the branches by examining the combination of flags; PowerPC implements two different comparison instructions affecting the same flags differently for signed and unsigned operands - the signedness cannot be deduced from low-level conditions alone.
Is ScratchABlock capable of dealing with machine-dependent conditionals (that is, transforming low-level concepts of flags and branches into the higher-level expressions like
if (cond) goto label
)? If yes, how is it implemented?Thank you in advance! Cheers Max