Open ptomin opened 2 years ago
Pavel, I've been working on this problem on my own end. My thoughts are going in a different direction. I'm considering moving type analysis earlier, or at least use the type analysis we are doing already in the analysis stage to perform escape analysis to detect all the stack variables and their sizes.
The following paper is really interesting: https://github.com/uxmal/retypd/blob/master/reference/paper.pdf The retypd analysis described in the paper processes the program one strongly connected component (SCC) of the call graph at a time. I like their approach for expressing type constraints better than Reko's current blind creation of equivalence classes for each assignment or passed parameter.
I would consider performing type analysis for each SCC of the call graph in the analysis phase, to at least discover the size of the parameters of each call to an previously analyzed function. That type analysis could be used to guide the generation of SSA parameters better.
Let's discuss this issue on discord. I think handling local stack variables and constraint based type inference may be the two best approaches to improving Reko's source code output.
Another interesting paper is this: https://www.airs.com/dnovillo/Papers/mem-ssa.pdf
Problem
Currently memory accesses are converted to variables too early. And it causes incorrect value propagation.
Example
Currently SSA transform converts it to
and then after value propagation
Stack storage can be affected after procedure call, but we ignore this fact which causes incorrect result.
Possible solution
I suggest to do transformations stack memory accesses to variables later. It can be done after Type Analysis phase in the same way as for global variables. We need to keep benefits of constant propagation at analysis phase though. We can use definition of memory identifiers (currently unused) for data flow analysis.
Memory identifier definition improvement
Current version of memory id definitions is incomplete.
Mem5[<ea>] = <src>
have not information about previous value of memory id. I suggest to introduceMem5[Mem4, <ea>] = <src>
syntax whereMem4
is used andMem5
is defined.Value propagation
If memory id, effective address and data type are the same then value propagation can be done.
can be transformed to
If memory ids are different than memory analysis can be done.
Example
We can prove that global variable access should not affect stack storage. So
<dst> = Mem6[fp - 8:word32]
can be converted to<dst> = Mem5[fp - 8:word32]
and after value propagation it will be<dst> = 0x123<32>
.Memory slices
Slice expression can be used to restrict area of defined/used memory.
Global memory
Global
can be splitted later toDynamic
andStatic
but it's another issue.All stack memory
First 8 bytes of local stack variables
Example
can be converted to