runtimeverification / haskell-backend

The symbolic execution engine powering the K Framework
BSD 3-Clause "New" or "Revised" License
208 stars 41 forks source link

Optimising SAT checks #3952

Closed PetarMax closed 1 month ago

PetarMax commented 3 months ago

When we check whether or not the current path condition can be extended with new constraints, it should be sufficient to check SAT of the new constraints and only the relevant part of the path condition.

One way of computing this relevant part is to start from the constraints that contain variables present in the new constraints, and from there perform a transitive closure with respect to the variables.

geo2a commented 3 months ago

How to compute the transitive closure of the variables occurring in the target condition (from @PetarMax):

  • start from the variables you have in the constraints, then cut the path condition across constraints that have those
  • then your new starting set is the variables you had + any other variables in those constraints, then you repeat until no changes
  • you can optimise this by keeping the constraints you know are certainly there and then in each iteration you only check the other ones

If you have variables {X, Y, Z}, then you traverse the path condition and take only the constraints that have any of those variables. Then you see that one of those constraints had W. Then you cut the rest on W only. etc etc until there’s no new variables.

goodlyrottenapple commented 2 months ago

An attempt to add this in kore is on this branch: https://github.com/runtimeverification/haskell-backend/tree/sam/SMT-transitive-closure

However, the above caused a noticeable regression in KEVM:

| Test                                           | sam-SMT-transitive-closure time | master-8ee0dad25 time | (sam-SMT-transitive-closure/master-8ee0dad25) time
|------------------------------------------------|---------------------------------|-----------------------|----------------------------------------------------
| mcd/flapper-yank-pass-rough-spec.k             | 159.67                          | 152.29                | 1.0484601746667541
| mcd/flopper-kick-pass-rough-spec.k             | 112.58                          | 106.63                | 1.0558004313982932
| erc20/ds/transfer-success-2-spec.k             | 57.34                           | 54.0                  | 1.0618518518518518
| mcd/flopper-file-pass-rough-spec.k             | 358.27                          | 337.04                | 1.06298955613577
| mcd/flopper-dent-guy-same-pass-rough-spec.k    | 554.38                          | 521.37                | 1.0633139612942824
| erc20/hkg/approve-spec.k                       | 44.1                            | 41.44                 | 1.0641891891891893
| mcd/dsvalue-read-pass-spec.k                   | 50.51                           | 47.26                 | 1.0687685146000847
| mcd/flopper-tick-pass-rough-spec.k             | 165.04                          | 154.36                | 1.0691889090437936
| erc20/ds/transfer-success-1-spec.k             | 59.17                           | 54.34                 | 1.0888847994111153
| erc20/hkg/transferFrom-success-1-spec.k        | 56.21                           | 50.66                 | 1.109553888669562
| erc20/hkg/transferFrom-success-2-spec.k        | 54.23                           | 48.71                 | 1.1133237528228288
| erc20/ds/approve-success-spec.k                | 53.62                           | 46.28                 | 1.1585998271391529
| benchmarks/ecrecoverloop02-sig0-invalid-spec.k | 190.63                          | 112.69                | 1.6916319105510693
| benchmarks/ecrecoverloop02-sig1-invalid-spec.k | 337.27                          | 190.85                | 1.7671993712339533
| benchmarks/ecrecoverloop02-sigs-valid-spec.k   | 355.27                          | 192.61                | 1.8445044390218575
| TOTAL                                          | 2608.29                         | 2110.5299999999997    | 1.2358459723387019
ehildenb commented 2 months ago

@PetarMax you really should post why you want an issue investigated. Spending time trying out a new algorithm is wasting time most of the time, when we have almost no context. Is this to prune branches? Then k rules might be the answer instead of Z3 improvements. Is this to show satisfiability in get-model endpoint? Then maybe this works, but Z3 doesn't operate so linearly like this. Is this to allow us to apply more equational rules? Then maybe again K rules might be the answer.

We must get out of the cycle of "guess optimization -> implement optimization -> measure results". It must be instead "measure current status -> identify hotspots -> what do those hot-spots mean at the high level? -> choose the solution that actually addresses those hotspots." We cannot do that without the actual context being provided in the issues you submit, including bug reports and/or minimal examples and teh original test.

Please, do not write (or pick up and work on) issues that do not include this additional context. Time is better spent running profilers.

PetarMax commented 2 months ago

I have discussed this in detail with the backend team, as well as the people working on the relevant engagement, and I opened the issue only after all of this discussion. We all agreed that we would expect this to introduce a performance improvement. I have documented the issues I think we should investigate and why on Slack here and here, but yes, I also should have added that detail to this issue.

The context is that there are problems with Z3 slowdowns in CSE related to engagement proofs - this is the hotspot of which you write. At the time of writing the issue, there were no mechanisms to measure this precisely, so I was going off of what I was seeing in the terminal. I have noticed that the more constraints I have, even if they are very simple, the slower the execution I have also noticed that, for the constraints in question, many constraints were simply not relevant for the query, and some had very complex arithmetic with which we know Z3 has problems and which were completely blocking the proof from progressing.

It is possible that the overhead of filtering outweighs the solver benefit, but there are two aspects to consider that are not yet covered by this initial implementation: 1) the KEVM proofs do not exercise Z3 to the extent to which engagement proofs exercise it - I should try this branch on these proofs to get more information; and 2) the results of this particular implementation are not representative of what will happen in Booster, since this has been implemented only in Kore, and especially now that Z3 reasoning has been lifted to the Booster.

ehildenb commented 2 months ago

I don't think solver will benefit signfiicantly from constraint filtering, because I don't think it slows down very much with number of constraints, only with complexity of constraints. Z3 doesn't work by doing interval arithmetic reasoning directly or anything, it encodes it all to some other domain, and solves a different problem (eg. for a bunch of diseqaulities, it encodes it to a graph and tries to find cycles in the graph).

  1. Then we should be upstreaming minimal examples that exercise the problematic portions. When we do a refute-node, for example, it gives us a minimal test that can be upstreamed. We should be in the habit of doing this automatically when we have branches that need pruned, for example.

The point is, we cannot just be guessing at potential optimizations. We should know that it will be a benefit before we do it.

"Z3 slowdowns in CSE related to engagement proofs" is not a "hotspot". That is a symptom. The hotspot could be "falling back to Kore because we can't handle remainders in booster, then Z3 takes a long time". Solving that would not mean "improve Z3 encoding", it could mean "direct remainder reasoning in booster, augmented with ability to write rules that simplify expressions like rule A andBool B => false requires C [simplification], for example. If we are just looking at "well we're spending a long time in Z3", without looking at why we are, then we could go down a rabbit hole trying to optimize Z3 when in fact we should just be skipping it altogether.

PetarMax commented 1 month ago

No, this is the hotspot - there are cases in which we have to call Z3, and in those cases we have to ensure that we are asking of it to do as little as possible. In that same line, it is not relevant how Z3 does its work, what is relevant is what we ask of it to do. Here are the measurements on relevant real-world engagement code:

E1 = georgy/no-smt-prelude-check
E2 = E1 + sam/SMT-transitive-closure

|  master    | master + E1 | master + E2 |
|-------------|-------------|------------|
|    02m 28s |     02m 25s |     02m 22s |
|    09m 27s |     06m 41s |     06m 39s | 
|    26m 07s |     22m 39s |     21m 05s |
|    07m 39s |     06m 40s |     06m 11s |
| 2h 04m 07s | ~2h 00m 00s |  1h 21m 46s | <==
|    26m 34s |     18m 22s |     17m 22s |

which indicate that this improvement is essential for tractable reasoning about real-world code.

Please could we re-run the KEVM measurements and regardless have this merged as soon as possible as an optional feature, turned off by default and turned on with a flag.