Improving pre-SMT reasoning capabilities

PetarMax commented 6 months ago

In the recent proofs, I have often observed the following pattern of reasoning:

We need C1: X <Int pow256 for a rule/simplification to fire;
We have C2: X <Int pow96, say, in the path constraints, so C1 holds;
Our simplifications aren't powerful enough to understand that C1 holds, so rule/simplification does not get applied in Booster and then (I think?) we either fall back to Kore which uses Z3 (if we're dealing with a rule) or we do nothing (if we're dealing with a simplification).

I believe that we could/should take care of these cases before they reach the SMT solver, and I would like to understand/discuss how this could be done modularly and where it should be done.

One way of doing this for this specific case would be to write simplifications in which I could tell the engine to syntactically match the path condition, rather than semantically, along the lines of:

rule A <Int B => true
  requires A <Int C
   andBool C <=Int B
  [simplification, concrete(B, C), match(C)]

where match(C) would mean 'match the path constraints syntactically to learn C, possibly also specifying which conjunct of the requires clause to use for the match (match(C, 1), for instance). In this example, we couldn't use C <=Int B anyway because both B and C are concrete, so that constraint would be fully resolved.

ehildenb commented 6 months ago

I think we should have C actually be ?C, because it's existentially quantified here. "If there exists a ?C such that A <Int ?C andBool ?C <=Int B, then A <Int B => true".

Because of that, I think we shouldn't need the match(C) attribute either. The backend can see that there is an existential variable in the requires clause, and so it can know that it must:

Process the side-conditions in the requires clause in order,
Attempt to find a value for ?C by looking for syntactic exact matches of the first clause (possibly coming up with multiple options),
Filter the options by seeing if later clauses are also satisfied by those matches.

This can be a very specific reasoning module for just this case, only when dealing with side-conditions.

PetarMax commented 6 months ago

Yes, ?C would work, I was trying to avoid it because I think that would require front-end parsing changes, since currently ?C is not parseable on the LHS of a rule.

ehildenb commented 6 months ago

That's fine, we can change the frontend to allow it for simplification rules. We need to make sure not to send it to the backend.

More important that the rule reads correctly. We don't need another way to say Exists, the ? already does that.

geo2a commented 6 months ago

We have had an discussion of this proposal in yesterdays Kontrol<>HB meeting, here's the summary of what I recall:

@PetarMax suggest this feature to enable interval analysis for integer inequalities. His hypothesis is that if we implement this, than Z3 would have an easier job or would not be needed a all in some cases.
I think that this feature would not actually help much, as we rely on the SMT solver for this sort of things. If we see that the solver struggles in some cases, we need to isolate the exact case where it does and debug the SMT transalation of this specific predicate in Booster and Kore.
Only when we are absolutely sure that the SMT translation is not the problem and the solver indeed struggles should we implement ad-hoc reasoning modules.

The reason I am somewhat hostile to the idea of adding ad-hoc reasoning procedures as part of the requires clause check (or anywhere else) is that these are very difficult to maintain. We know that Kore has a number of such procedures that are essentially a black box for us as the backend developers. Deciphering and perhaps refactoring them is certainly possible, but would require significant resources.

tl;dr

We have the SMT solver for this sort of thing, both in Booster and in Kore. If we see examples where either Booster or Kore fails to apply simplifications due to unknown predicates, we should drill into why these are unknwon. There may be holes in Booster's SMT interface and hopes+historical baggage in Kore's. We should identify and fix those first, and only then extend K with additional rule types that hook into the backend.

@goodlyrottenapple please add any points from the discussion that I have forgotten to mention.

PetarMax commented 6 months ago

I understand the concern about ad-hoc reasoning modules, especially in the context of Kore history. I do believe that will not have the same issues if we document precisely what is being done and implement reasoning modules modularly, with the ability to be switched on and off.

I think the question I would like to have an answer to is: is Z3 called in Booster when trying to determine if a condition of a requires clause holds? If yes, then I'm surprised there's a fallback to Kore for transitivity of <Int. If not, why not?

ehildenb commented 6 months ago

This is not ad-hoc, it's very specific, and has come up as a soft ask several times over the years. Basically, what we want is existential binding against things that already exist in the side-conditions. It's easy to see how this reasoning works, and also easy for the backend to see when it should apply this reasoning (are there simplification rules with existentials in the requires? Then try applying this when evaluating side-conditions).

I disagree fully with this being ad-hoc. Improving SMT translation is just making the backend better at "magic", but not giving the user more power here.

geo2a commented 3 months ago

The proposal has been refined and a new issue in the K repo has been opened. Let us use this issue to track the implementation of this feature in Booster.

I have a question regarding what to do when we have multiple matches.

Let's have a look at a happy-path scenario when we have two matches, but only one is actually non-False:

rule [test1-init]: <k> test1Init() => test1State1() ... </k>
                   <int-x> _         => ?X              </int-x>
                   <int-y> _         => ?Y              </int-y>
                   ensures  ?X <Int 100
                    andBool ?Y <Int 50
                    andBool ?X <Int ?Y

rule [test1-1-2]: <k> test1State1() => test1State2() ... </k>
                  <int-x> X                              </int-x>
                  requires X <Int 50

rule [test1-simplify]:
    X <Int Y => true
    requires X <Int Z
     andBool Z <Int Y
     // [simplification, syntactic(1)]
     [simplification]

In order to apply rule test1-1-2 to the current pattern

[booster][execute][term 4dea6a7][rewrite 0bf6c1a][success][term 40a330c][kore-term]
<generatedTop>(<k>(kseq(test1State1()_SYNTACTIC-SIMPLIFICATION_State(), dotk())), <int-x>(Var?X0:SortInt{}), <int-y>(Var?Y0:SortInt{}), <generatedCounter>("0"))
/\ _<Int_(Var?X0:SortInt{}, "100")
/\ _<Int_(Var?X0:SortInt{}, Var?Y0:SortInt{})
/\ _<Int_(Var?Y0:SortInt{}, "50")

we need to prove

[booster][execute][term 40a330c][rewrite 18e3545][constraint][term fde6171][kore-term] _<Int_(Var?X0:SortInt{}, "50")

we match with the simplification test1-simplify and now we need to instantiate the free variable Z in its requires clause:

[booster][execute][term 40a330c][rewrite 18e3545][constraint][term fde6171][simplification c9b0e63][failure][continue]
Aborting syntactic simplification after instantiating the condition to
_<Int_(Var?X0:SortInt{}, Eq#VarZ:SortInt{}) , _<Int_(Eq#VarZ:SortInt{}, "50")

The question is: how do we pick the instantiation? In this particular case, it's relatively easy, as we only have two options and only one of them is valid:

We can substitute Eq#VarZ for "100"

_<Int_(Var?X0:SortInt{}, "100") , _<Int_("100", "50")

which the LLVM backbend will discharge as false, no Z3 needed.

Or we can substitute Eq#VarZ for Var?Y0

_<Int_(Var?X0:SortInt{}, Var?Y0:SortInt{}) , _<Int_(Var?Y0:SortInt{}, "50")

in this case, we get two clauses that are syntactically present in the constraints — we can apply test1-simplify and rewrite with test1-1-2.

The amount of matches will however grow exponentially with the amount of constraints, and we will likely start getting multiple non-False instantiations.

How would we choose which one to use?

PetarMax commented 3 months ago

I would apply first successful and would not cache - these simplifications should be instant.

runtimeverification / haskell-backend

Improving pre-SMT reasoning capabilities #3861

tl;dr