nevillegrech / gigahorse-toolchain

A binary lifter and analysis framework for Ethereum smart contracts
Other
285 stars 58 forks source link

How does loops work? How does Gigahorse translate loops? #137

Open lahiri-phdworks opened 1 month ago

lahiri-phdworks commented 1 month ago

I am not able to understand the translation of a loop function to the IR generated by Gigahorse. I have a solidity function that looks like the example shown below.

Example:

pragma solidity >=0.4.2 <=0.7.6;

contract Loop {
    function loop_example (uint a, uint b) public view returns (uint) {
       require(a > 0, "parameter a passed to this function be greater than 0");
       uint c = 0;
       while (a > 0) {
           a = a - 1;
           c = c + b
       }
       return c;
    }
}

Any clarification in this direction will be really helpful.

sifislag commented 1 month ago

Hi, have you been able to analyze this program using gigahorse? You can find our loop related logic in clientlib/loops.dl and clientlib/loops_semantics.dl.

Our main relation is this one, denoting that a basic block s is part of a loop identified via loophead:

.decl BlockInStructuredLoop(s:Block, loophead:Block)

a originating as a stack variable in the analyzed EVM program, will be placed on a register Variable in our IR.

Hope I was able to help, let me know if you have more questions.

lahiri-phdworks commented 1 month ago

Thanks, I was going over those files and the one on dominance as well.

The doubt I have is that in case of loops, in the loop header block, there will be two sets of variables that come in (i.e. two stacks of registers I guess), one from the previous block and other via a back edge.

How does the tool merge them (definitions)? I can see PHI instructions in the IR, but is it guaranteed that the number of variables in both the stacks will remain the same?

sifislag commented 1 month ago

Hi, I'm not exactly sure what the question is but I think I got it. File clientlib/loop_semantics.dl defines several related concepts we use to model loops.

/**
  Simple loop induction variable. Identified by the two variables.
*/
.type SimpleInductionVariable = [beforeLoopVar: Variable, inLoopVar: Variable]

.decl WellFormedLoopInductionVariable(loop: Block, inductionPhiVar: Variable, simpleIndVar: SimpleInductionVariable)
.decl InductionVariableIncreasesByConst(loop:Block, simpleIndVar:SimpleInductionVariable, const:number)
.decl InductionVariableDecreasesByConst(loop:Block, simpleIndVar:SimpleInductionVariable, const:number)
.decl InductionVariableStartsAtConst(loop:Block, simpleIndVar:SimpleInductionVariable, const:Value)
.decl InductionVariableLowerBoundVar(loop:Block, indVar:SimpleInductionVariable, bound:Variable)
.decl InductionVariableUpperBoundVar(loop:Block, indVar:SimpleInductionVariable, bound:Variable)

Essentially because of PHIs in our IR are introduced at every block that accesses them (instead of placing them at dominance frontier), so we identify well formed induction variables by the beforeLoop and inLoop variables. We have several high-level relations that use this concept.