Closed VictorCMiraldo closed 1 year ago
The last commit introduces a hack for something we're not quite knowledgeable of: apparently, translation to SMT can fail and we would like to know exactly why and the implications of that.
Added some debugging code in here; plan for tomorrow is use the same debugging code on main
and see how differently it behaves. I think we're pretty close.
While debugging this morning, I quickly spotted the issue. On the main
branch, we collect the following constraints for cabal run spec -- -p /ohearn/ && /y == 11/
:
b/sub $s2 1 == 0
∧ $s2 /= 0
∧ x ↦ D $s2 $s3
∧ __result ↦ D $s0 $s1
Body: b/ite @Delta False (D $s2 42) (D $s2 $s3)
Assume: b/== $s1 42
Prove: b/== $s3 11
Note how we actively collect b/sub $s2 1 == 0
and keep that! In our implementation of constraints, that is impossible to represent: we would gather that through TermEq [term| b/sub $s2 1 |] [term| 0 |]
, but that will trigger the unification algorithm to run and that will say /nope/: those are not unifiable. Consequently, that branch would be automatically pruned out and we would miss the counterexample, making the test run forever!
We need to record non-unification constraints in our ConstraintSet
too and engineer it a little more carefully. A great plan for today would be to fix the bug then write tests for that module!
When I merge this to qa/z3_inline-c_faster-eval_batches
and run the benchmarks the running-time of both Z3 and Pirouette seem to increase by ~2s (out of 20s for Z3 and 40s for Pirouette). But this might just be noise.
More precisely, here's the kind of benchmark results I get (after the PR / before the PR):
Z3: 22.984s count: 113600
command: 30.788s count: 1546100
launchSolver: 0.412s count: 100
main: 42.496s count: 1
readSExpr: 0.026s count: 56800
showsSExpr: 7.390s count: 1659700
Z3: 21.842s count: 113600
command: 29.886s count: 2355500
launchSolver: 0.432s count: 100
main: 43.256s count: 1
readSExpr: 0.028s count: 56800
showsSExpr: 7.442s count: 2469100
The time difference varies (here it's ~1s) but the pre-PR version is consistently faster than the post-PR one on the interaction with SMT solvers. I don't see how the post-PR version could be sending less commands to the solver but still spend more time inside it.
I don't see how the post-PR version could be sending less commands to the solver but still spend more time inside it.
This might be horribly outdated, but since I was (superficially) involved in the considerations that led to the redesign of the constraints type, maybe this helps: The goal was precisely to weed out trivial problems before consulting the SMT solver, in order to reduce the overhead incurred by sending them. So, it makes intuitive sense to me that there are fewer problems the solver sees, but it has to think harder about them. Do you have any data on, say, the size of the problems? (Or is my remark completely beside the point?)
Something else I haven't mentioned here is that the Z3
count is only incremented when a command with interesting results is sent, typically check-sat
(whereas the command
count is incremented for every command). So I would expect the Z3
count to be smaller post-PR since the solver should not be checking for satisfiability as often.
So, it makes intuitive sense to me that there are fewer problems the solver sees, but it has to think harder about them.
I agree with the fewer problems part, but why would the solver have to think harder about them ? Aren't they supposed to be the same problems ?
Do you have any data on, say, the size of the problems?
Apart from the number of commands sent, no. What kind of data would help ?
Hmm, I think I can't really answer your questions accurately without knowing more about the benchmarks that you actually run and a dive into what the code in this PR actually does implement. But I think that the problems the solver sees should be
These are the two implications I expect from the changed learn
function...
I logged the commands sent to the solver pre- and post-PR and ran diff
on the results. The output is huge but here are some interesting examples that I noticed while looking at the first lines (<
is pre-PR and >
is post-PR):
< (set-option :produce-models true)(declare-fun pir_greaterThan1 (Int) Bool)(declare-fun pir_greaterThan0 (Int) Bool)(declare-fun pir_add1 (Int) Int)(push 1)(declare-fun pir___result () Int)(declare-fun pir_x () Int)(check-sat)
---
> (set-option :produce-models true)(declare-fun pir_greaterThan1 (Int) Bool)(declare-fun pir_greaterThan0 (Int) Bool)(declare-fun pir_add1 (Int) Int)(push 1)(declare-fun pir___result () Int)(declare-fun pir_x () Int)(assert true)(check-sat)
and
< (pop 1)(push 1)(declare-fun pir___result () Bool)(declare-fun pir_s0 () Int)(declare-fun pir_x () pir_MaybeInt)(assert (= pir___result true))(assert (= pir_x ((as pir_JustInt pir_MaybeInt) pir_s0)))(check-sat)
---
> (pop 1)(push 1)(declare-fun pir___result () Bool)(declare-fun pir_s0 () Int)(declare-fun pir_x () pir_MaybeInt)(assert (and (= pir___result true) (= pir_x ((as pir_JustInt pir_MaybeInt) pir_s0))))(check-sat)
How I interpret this is that it seems this PR is doing at least two things:
assert true
in the first example)The first point explains why the solver is sent as many problems pre- and post-PR, and the second point explains why there are still less commands sent to the solver overall (I assume the number of times Pirouette detects a problem is trivial is negligible compared to the number of assertions that can be merged). This still doesn't explain why the solver is spending more time on solving the problems post-PR though: if anything, it should be spending less time because some problems now come directly with their solutions.
I ran Z3 on the logfiles of the commands produced by Pirouette pre- and post-PR and I do get a noticeable difference in the running-times, although the files should be logically equivalent:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
z3 isunity.pre >/dev/null |
190.3 ± 12.6 | 176.4 | 231.9 | 1.00 |
z3 isunity.post >/dev/null |
230.2 ± 3.0 | 223.6 | 241.0 | 1.21 ± 0.08 |
Looking at the diff between both files more closely it really seems the only differences are that post-PR 1) a few trivial and negligible assertions are added, 2) some redundant declarations and assertions are removed and 3) assertions are merged together using conjunctions. This is to be taken with a grain of salt of course because there are too many lines for someone to go through every one of them.
My guess right now is that Z3 spends more time parsing one big assertion with conjunctions than it does parsing many small assertions consisting of simple (in)equalities. @facundominguez, you had a look at the internals of Z3's parser, does that sound plausible to you ?
I have no intuition on the cost of parsing larger expressions. But it could be checked by splitting the conjunctions into separate assertions ahead of feeding them to the SMT solver.
Ah, and there is a function that only does the parsing Z3_parse_smtlib2_string
. You might be able to measure if it worsens with the large conjunctions. I modified the oneshot.c
program to invoke it at some point:
#include<stdio.h>
#include<stdlib.h>
#include<stdarg.h>
#include<z3.h>
void my_parser_example(char const* const fileName) {
Z3_config cfg;
Z3_context ctx;
cfg = Z3_mk_config();
Z3_set_param_value(cfg, "model", "true");
ctx = Z3_mk_context(cfg);
Z3_del_config(cfg);
FILE* file = fopen(fileName, "r");
char* buffer = NULL;
size_t len;
ssize_t bytes_read = getdelim( &buffer, &len, '\0', file);
Z3_ast_vector b =
Z3_parse_smtlib2_string(ctx,
buffer,
0,
NULL,
NULL,
0,
NULL,
NULL);
printf("%s", Z3_eval_smtlib2_string(ctx, buffer));
fclose(file);
}
int main(int argc, char* argv[]) {
char const* const fileName = argv[1];
my_parser_example(fileName);
return 0;
}
This doesn't seem to be the explanation: I merged the assertions on the logfile produced by Pirouette pre-PR and I got
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
z3 isunity.pre >/dev/null |
200.6 ± 13.5 | 182.1 | 226.7 | 1.00 |
z3 isunity_merge.pre >/dev/null |
203.2 ± 3.6 | 197.2 | 227.0 | 1.01 ± 0.07 |
Are those benchmarks still up-to-date with the merge of #150? I would expect so because #150 is not supposed to change the generated constraints, but who knows.
Are those benchmarks still up-to-date with the merge of #150?
Yep, I'm ran them again and I'm not seeing any difference.
FYI if you want to run these benchmarks yourself you just have to do DEBUG_TIMESTATS_ENABLE=1 cabal run example
on qa/z3_inline-c_faster-eval_batches
and qa/z3_inline-c_faster-eval_batches_contstraints
(the latter measuring the performance of this PR). Make sure to run direnv reload
first as these branches use a pinned version of Z3.
With DEBUG_TIMESTATS_ENABLE=1 cabal run example
, I can confirm @qaristote's observations: this PR makes things slightly slower (in particular, 10% more time spent in Z3). However, the overall run is barely affected (1-2%). I suggest we merge this PR as it cleans up a lot of code for a very minor cost.
This PR changes the simple
Constraints
datatype intoConstraintSet
, adding a little more structure, namelly, it adds aconjunct :: Constraint lang meta -> ConstraintSet lang meta -> Maybe (ConstraintSet lang meta)
which returnsNothing
whenever it the resultingConstraintSet
would be trivially UNSAT.edit: This PR is ready for review, someone should really tackle the implementation of a proper union sort, as suggested here (and write good tests for that!); It might be worth to do it as a separate issue, after merging.
Unrelated important change: this change in the order of terms that are executed by the
worker
function is very important! We should always first try to expand the assumption term, which might prune all sorts of uninteresting branches.