uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.17k stars 253 forks source link

Local stack variables handling need to be revised #1125

Open ptomin opened 2 years ago

ptomin commented 2 years ago

Problem

Currently memory accesses are converted to variables too early. And it causes incorrect value propagation.

Example
Mem1[fp - 8<32>:word32] = 0x123<32>
func(fp - 8<32>)
# Stack storage can be affected after procedure call
Mem5[0x10000008:word32] = Mem4[fp - 8<32>:word32]

Currently SSA transform converts it to

dwArg08_1 = 0x123<32>
func(fp - 8<32>)
# Stack storage can be affected after procedure call
Mem5[0x10000008:word32] = dwArg08_1

and then after value propagation

dwArg08_1 = 0x123<32>
func(fp - 8<32>)
# Stack storage can be affected after procedure call, but we ignore this fact
Mem5[0x10000008:word32] = 0x123<32>

Stack storage can be affected after procedure call, but we ignore this fact which causes incorrect result.

Possible solution

I suggest to do transformations stack memory accesses to variables later. It can be done after Type Analysis phase in the same way as for global variables. We need to keep benefits of constant propagation at analysis phase though. We can use definition of memory identifiers (currently unused) for data flow analysis.

Memory identifier definition improvement

Current version of memory id definitions is incomplete. Mem5[<ea>] = <src> have not information about previous value of memory id. I suggest to introduce Mem5[Mem4, <ea>] = <src> syntax where Mem4 is used and Mem5 is defined.

Value propagation

If memory id, effective address and data type are the same then value propagation can be done.

Mem5[Mem4, fp - 8:word32] = 0x123<32>
.....................................
<dst> = Mem5[fp - 8:word32]

can be transformed to

Mem5[Mem4, fp - 8:word32] = 0x123<32>
.....................................
<dst> = 0x123<32>

If memory ids are different than memory analysis can be done.

Example
Mem5[Mem4, fp - 8:word32] = 0x123<32>
.....................................
Mem6[Mem5, 0x10000000:word32] = 0x456<32>
.....................................
<dst> = Mem6[fp - 8:word32]

We can prove that global variable access should not affect stack storage. So <dst> = Mem6[fp - 8:word32] can be converted to <dst> = Mem5[fp - 8:word32] and after value propagation it will be <dst> = 0x123<32>.

Memory slices

Slice expression can be used to restrict area of defined/used memory.

Global memory
Global2 = SLICE(Mem1, 0, UnknownType)

Global can be splitted later to Dynamic and Static but it's another issue.

All stack memory
Stack2 = SLICE(Mem1, fp - 0x80000000, UnknownType)
First 8 bytes of local stack variables
Stack8_8_2 = SLICE(Mem1, fp - 0x08, UnknownType(8))
Example
def Mem1
def fp
Mem2[Mem1, fp - 8<32>:word32] = 0x123<32>
Mem3[Mem2, 0x10000000:word32] = 0x456<32>
Mem4[Mem3, Mem3[0x10000004:word32]] = 457<32>
func(fp - 8<32>)
    use Mem4
    def Mem5
Mem6[Mem5, 0x10000008:word32] = Mem5[fp - 8<32>:word32]

can be converted to

def Mem1
def fp
Stack1 = SLICE(Mem1, fp - 0x80000000, UnknownType)
Global1 = SLICE(Mem1, 0, UnknownType)
Stack2[Stack1, 0x7FFFFFFC<32>:word32] = 0x123<32>
Global2[Global1, 0x10000000:word32] = 0x456<32>
Global3[Global2, Global2[0x10000004:word32]] = 457<32>
Stack8_8_1 = SLICE(Stack2, 0x7FFFFFFC, UnknownType(8))
func(fp - 8<32>)
    use Global3, Stack8_8_1
    def Global4, Stack8_8_2
Global5[Global4, 0x10000008:word32] = Stack8_8_2[0<32>:word32]
uxmal commented 2 years ago

Pavel, I've been working on this problem on my own end. My thoughts are going in a different direction. I'm considering moving type analysis earlier, or at least use the type analysis we are doing already in the analysis stage to perform escape analysis to detect all the stack variables and their sizes.

The following paper is really interesting: https://github.com/uxmal/retypd/blob/master/reference/paper.pdf The retypd analysis described in the paper processes the program one strongly connected component (SCC) of the call graph at a time. I like their approach for expressing type constraints better than Reko's current blind creation of equivalence classes for each assignment or passed parameter.

I would consider performing type analysis for each SCC of the call graph in the analysis phase, to at least discover the size of the parameters of each call to an previously analyzed function. That type analysis could be used to guide the generation of SSA parameters better.

Let's discuss this issue on discord. I think handling local stack variables and constraint based type inference may be the two best approaches to improving Reko's source code output.

uxmal commented 2 years ago

Another interesting paper is this: https://www.airs.com/dnovillo/Papers/mem-ssa.pdf