Closed ThomasHaas closed 1 year ago
Hi @ThomasHaas.
you are giving only few concrete information for a good answer.
Could you provide more information, e.g.,
Is your approach comparable to (boolean) predicate abstraction (cf. Beyer/Dangl/Wendler, 2018: A Unifying View on SMT-Based Software Verification, https://doi.org/10.1007/s10817-017-9432-6)? We provide a method "allSat" for querying and iterating over all models for a set of boolean variables.
I can only guess that your issue might to be related to MathSAT itself.
The overhead of JavaSMT should not need that much time.
It could be that the MathSAT-internal representation of a model is recomputed for every assignment query, however I doubt that, because we are already using the same model-reference for all queries. Using model.asList
will convert the full model for JavaSMT, which will be more overhead than querying only a few boolean assignments.
Our approach is not a predicate abstraction. We verify parallel programs under weak memory models and have a custom theory solver for memory consistency that explores a program execution (a model provided by the solver), checks its consistency w.r.t. a given memory model and, if found inconsistent, computes reasons of inconsistency. Such reasons are essentially partial assignments that cause inconsistency. We refine our encoding by simply disallowing this partial assignment (i.e. adding new clauses that cut off those partial assignments). We basically run the SMT-solver in incremental mode and add new clauses after each check-sat, depending on the model found.
Obtaining the program execution (essentially a labelled graph) from the solver model is what takes a lot of time for MathSat5 (more than the whole theory reasoning). The code that does this is identical for Z3, Yices2 and Mathsat5 (i.e., we have the same formulas that we evaluate under the model via model.evaluate
).
The overall sequence of check-sat queries is different of course, because the different solvers find different program executions which in turn forces us to add different clauses in each iteration.
Here is an example using Z3 (same benchmark):
Number of iterations: 420
Total SMT solving time(ms): 385804
Total abstraction refinement time(ms): 12128
-- Model extraction time(ms): 1258
As you can see, Z3 did a lot more iterations and overall needed a lot more time in answering the check-sat queries than MathSat5. However, it only took 1.2 seconds to evaluate all 420 models, meaning it took around 3ms per model. MathSat5 model evaluation took around 150ms per model (50 times slower).
In short, the issue is that model.evaluate()
is at least an order of magnitude slower for MathSat5 than it is for the other solvers. We do evaluate both boolean formulas and integer formulas under the model, so maybe this issue comes from evaluating integer formulas (could there be some data type conversion going on)?
Given this description, does it make sense to provide queries/logs ("solver.logAllQueries=true")? Neither the formulas nor the check-sat queries are issues (in fact, MathSat5 performs really well, only exploring the model is very slow).
The model evaluation is directly forwarded to MathSAT: https://github.com/sosy-lab/java-smt/blob/master/src/org/sosy_lab/java_smt/solvers/mathsat5/Mathsat5Model.java#L138
We can of course ask the MathSAT developer for internal details and potential perfomance improvements, but we would need to provide a specific query for this. The SMTLIB logfile might not contain each call for direct model evaluations, so please give some details:
You have custom JNI files, so that snippet does not really show if they call just gets forwarded to Mathsat5. However, the JNI files don't seem to be suspicious either.
Here is a smt.zip containing the first query (as generated by solver.logAllQueries=true
) and the model returned by the solver (copied from JavaSMT's toString representation of the model). I cannot actually run MathSAT5 on my MacOS, so the formula and the model are generated with Z3, but there shouldn't be any difference at all when using MathSAT5.
The query is large, but what we evaluate are the boolean variablesrf(x, y)
, co(x, y)
, and cf(x)
as well as the integer variables like e.g. |10:$p25(x)|
(these variables stand for register or memory values).
That being said, the formula is already simplified and we actually query the model for e.g. 8*|10:$p25(x)| + 4
(linear expression in the integer variables).
We only ever evaluate the rf/co/register variables if the associated cf-variable is true (i.e. we only evaluate variables that are relevant for the control-flow of the current program execution).
I'm not sure if this helps you or the MathSAT developers in any way.
Hi @ThomasHaas.
In my tests, based on the provided SMTLIB file, I could produce some statistics. I parsed the given file into two queries, checked satisfiability and the evaluated symbols in the model.
--- MATHSAT5 ---
Parsing: 109ms
Solve 1: 447ms
Solve 2: 29ms
Model Creation: 10ms
Model to List: 188ms
Number of Assignments: 4051
Evaluate each symbol directly: 14ms (4051 calls)
--- Z3 ---
Parsing: 232ms
Solve 1: 139ms
Solve 2: 8ms
Model Creation: 6ms
Model to List: 52ms
Number of Assignments: 4023
Evaluate each symbol directly: 12ms (4023 calls)
--- SMTInterpol ---
Parsing: 916ms
Solve 1: 13889ms
Solve 2: 2839ms
Model Creation: 21ms
Model to List: 171ms
Number of Assignments: 4051
Evaluate each symbol directly: 7ms (4051 calls)
Please note that all measurements are only a simple baseline. I see that MathSAT and SMTInterpol are slower than Z3 in some operations, but for MathSAT this is only a small factor. Z3 allows to ignore variables and values in the model, if their value is irrelevant for satisfiability. Thus, Z3 has a smaller number of variables in the model.
I could not yet reproduce your issue where a single call for model evaluation requires a high time (like the mentioned 150ms). Is there any application (and a commandline) that could be executed and debugged from our position?
Ok, that is very interesting. I assume you evaluated both the boolean and the integer variables. If that is the case, and it is indeed that fast, then something is very odd on our side. My intuition was that we evaluate complex expressions (like e.g. 'x + 4y + 8') over the model, however, after checking our code base, this is not even the case. I will need to dig a little deeper to find the core of the issue then.
I could not yet reproduce your issue where a single call for model evaluation requires a high time (like the mentioned 150ms).
What I measured was not a single call, but evaluating all symbols I need, which should more closely correspond to your "evaluate each symbol directly". The difference is that I only evaluate a strict subset of all symbols, potentially multiple times though.
While I don't expect you (or anyone else) to inspect our code base (or even run it*), you can find the most relevant code here: Dartagnan. Essentially, there are 4 different cases where we evaluate the model:
addEvent
). Here we evaluate complex expressions in principle, but our code actually only evaluates the variables over the model and then evaluates the complex expression on the Java-side.extractCoherences
). These variables are integer (clock) variables.extractReadsFrom
). These variables are boolean.Now in a typical benchmark, this code is run 100+ times (as seen in my example) and we track the times spent in different parts of the refinement procedure, for each single iteration as well as a total summary. Only MathSAT5 shows these long running times in the model evaluation phase. I will check if the model evaluation is consistently slow (each iteration) or if there are some outliers that skew the whole summary. Since I don't have access to a Linux-OS, I will need to bother my colleague to collect the statistics for me, so it might take a few days.
*If you are interested in running the tool, I can help you getting it to run on some benchmarks.
I noticed that MathSAT5
also works on Windows (stupid me), so I could get around to actually run and debug the issue myself.
And I can pinpoint where the time gets lost: The very first call to model.evaluate()
is insanely expensive, even when evaluating a simple boolean variable.
In fact, it is orders of magnitudes slower than all other evaluations we do combined.
It seems as if MathSAT5
does not keep track of a model while solving and only creates it on demand on the very first evaluation. Unfortunately, it resets the model on each isUnsat
query in our refinement loop, so the models keep getting recomputed a lot of times. I also noticed that the time it takes seems to increase with the number of refinements we do, although we only ever add clauses and no variables. I assume this is caused because MathSAT5
does a final check of the model against the formula when constructing it, which explains why additional clauses slow it down.
On my Windows PC, Z3
needs 2-3ms
to evaluate the model, MathSAT5
needs around 200ms
(of which 198ms
are spent in the first model.evaluate
), and inside our docker container which runs Linux, MathSAT5
took a whooping 2000ms
per model.
Now the question is: Am I right about this and it is indeed the on-demand model construction that takes so much time? And if so, are there ways to improve this model construction? Maybe some configuration option so that MathSAT5
does maintain its model incrementally (aligned with the incremental satisfiability checking) instead of computing it from scratch each time?
Even in the tame case of just 200ms
per refinement iteration, model construction time made up 1/3 of all time spent interacting with the solver (i.e. 2/3 was checking satisfiability, 1/3 was getting the model). In the even slower case of 2000ms
, model construction time made up 85% of the whole verification procedure (~6 times slower than the satisfiability checking itself).
Edit: According to the MathSAT5 paper, model construction is incremental, so it shouldn't be that slow I think.
I did some more digging and found out that MathSAT5 has two ways to retrieve a model value according to their API documentation. Both methods are also in JavaSMTs native API for MathSAT5:
public static native long msat_get_model_value(long e, long term); // Evaluate term in current model (unused)
public static native long msat_model_eval(long model, long term); // Evaluate term in given model (used)
JavaSMT does only ever use the second one which takes a concrete model as input. The first one takes an environment as input and evaluates the term in the "current" model. Could it be that the first method that evaluates w.r.t the current model avoids model-reconstruction and that the second one needs to create some sort of snapshot of the model first so that it persists even after performing new checks and adding new formulas?
The API documentation of the former has preconditions:
Preconditions:
- model computation was enabled in the configuration of the environment
- the last call to msat_solve returned a MSAT_SAT result
- no assert/push/pop/allsat commands have been issued in the meantime
The documentation of the latter has no preconditions, indicating that the model-reference has to be persistent across multiple sat-checks. For this to work, the solver has to create a persistent copy of the model at some point, which here seems to be on the first call to msat_model_eval(long model, long term)
.
Current status:
Thank you for the effort. Unfortunately, it is quite hard to construct an explicit example of this behaviour because it's not just a matter of running some SMTLIB problem through JavaSMT. I think the issue can be closed for now.
Due to the recent release I got to try out the new Evaluator
feature. Unfortunately, MathSAT5 shows the exact same performance characteristics as before, that is, the very first call to evaluator.evaluate()
is extremly slow:
// Querying the model 7 times for a simple boolean variable:
Evaltime (ns): 67302100 // First call is extremely slow
Evaltime (ns): 12200 // All further calls are quick
Evaltime (ns): 4100
Evaltime (ns): 3100
Evaltime (ns): 4100
Evaltime (ns): 2700
Evaltime (ns): 2700
I guess the problem must be on MathSAT5's side then.
I have an abstraction refinement procedure which iteratively queries the SMT-solver, and in the case of SAT, explores the model returned by the solver to generate refinement clauses. From the whole model, I usually need to only explore a fraction of the assignments made by the solver because I can infer the truth values of the remaining variables. So I systematically query the model for values of single subformulas (I don't get the full set of assignments in a single call).
When using Z3 and Yices2, the time spent in exploring the model this way is rather small. But in the case of MathSat5, times are at least an order of magnitude worse. Here are statistics generated in a single abstraction refinement run.
As you can see, the solver runs 50 seconds over 244 iterations, and the actual refining takes 41 seconds of which a whooping 37 seconds are just spent querying the model. A similar statistics for Z3/Yices2 shows times of < 2 seconds.
Now I'm wondering, is this overhead native to MathSat5 or do the JavaSMT bindings create unnecessary overhead? Also, could it be faster to get the full model assignments in a single call (
model.asList
)?