sosy-lab / java-smt

JavaSMT - Unified Java API for SMT solvers.
Apache License 2.0
179 stars 44 forks source link

Add Visitor for Formula Traversal and Transformation #5

Closed PhilippWendler closed 8 years ago

PhilippWendler commented 8 years ago

These exist already in CPAchecker and should be moved to JavaSMT.

PhilippWendler commented 8 years ago

What about theory for arrays? Strings? Custom datatypes? Should we delegate all unknown cases to visitFunctionApplication?

Where is the difference between array access, string manipulation etc. to integer arithmetic functions? I do not see any, thus for me it is clear that this is a function application.

PhilippWendler commented 8 years ago

The following cases should be handled in the visitors

I would actually close the feature set on boolean visitors now. We don't need to handle every possible case. After all, SMT atoms are not really atomic, and all kinds of "logical" things can happen inside, and if we miss a few operators it is not a problem.

So you propose to treat all xor terms as atoms? I disagree, it would be very surprising and inconvenient for users, especially given the fact that solvers might introduce xor functions by themselves.

For variables, it might be acceptable to treat them as atoms, though I think it can also be surprising, and users who would like to access these variables directly would find it much more convenient if visitVariable is added.

PhilippWendler commented 8 years ago

If we talk about FormulaTransformationVisitor how to handle classes which would return a different output for a quantified variable? They should be either untouched or renamed altogether.

Well, we can never fully ensure that the user does not violate this because we cannot force users to use FormulaTransformationVisitor for formula transformations, they could always use FormulaVisitor. I see the following possibilities:

  1. ignore
  2. runtime check that method returns its argument
  3. override visitBoundVariable method in FormulaTransformationVisitor and make it final
  4. override visitBoundVariable method in FormulaTransformationVisitor and let it delegate to a second void method

I would probably go for 3., though it has the disadvantage that users that need to see that variable are now forced to use FormulaVisitor directly and re-implement the boiler-plate code.

cheshire commented 8 years ago

I would go for either (3) or (4), with (3) being less boilerplate and (4) being less surprising

On Wed, Dec 23, 2015 at 11:43 AM, Philipp Wendler notifications@github.com wrote:

If we talk about FormulaTransformationVisitor how to handle classes which would return a different output for a quantified variable? They should be either untouched or renamed altogether.

Well, we can never fully ensure that the user does not violate this because we cannot force users to use FormulaTransformationVisitor for formula transformations, they could always use FormulaVisitor. I see the following possibilities:

  1. ignore
  2. runtime check that method returns its argument
  3. override visitBoundVariable method in FormulaTransformationVisitor and make it final
  4. override visitBoundVariable method in FormulaTransformationVisitor and let it delegate to a second void method

I would probably go for 3., though it has the disadvantage that users that need to see that variable are now forced to use FormulaVisitor directly and re-implement the boiler-plate code.

— Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/5#issuecomment-166860786.

PhilippWendler commented 8 years ago

What should be done in FormulaVisitor for true and false?

PhilippWendler commented 8 years ago

Shouldn't UnsafeFormulaManager.visit be moved to FormulaManager?

cheshire commented 8 years ago

It's in both, one calls the other

On Wed, Dec 23, 2015 at 1:22 PM, Philipp Wendler notifications@github.com wrote:

Shouldn't UnsafeFormulaManager.visit be moved to FormulaManager?

— Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/5#issuecomment-166880457.

PhilippWendler commented 8 years ago

For true and false, maybe we should rename visitNumeral to visitConstant? In this case I would strongly suggest to change the String argument to Object, with a clearly defined list of possible types including Boolean and Number.

PhilippWendler commented 8 years ago

I think that visitFunction is not a good name, because UFs are also functions. visitOperator instead?

cheshire commented 8 years ago

On Wed, Dec 23, 2015 at 1:32 PM, Philipp Wendler notifications@github.com wrote:

I think that visitFunction is not a good name, because UFs are also functions. visitOperator instead?

What about "and", "or" and friends? Would you call them operators?

— Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/5#issuecomment-166882092.

PhilippWendler commented 8 years ago

Sure, boolean operator is a common term, right? I would actually always call them operators outside of SMT world instead of functions.

cheshire commented 8 years ago

OK, let's change to "visitOperator" then.

On Wed, Dec 23, 2015 at 1:38 PM, Philipp Wendler notifications@github.com wrote:

Sure, boolean operator is a common term, right? I would actually always call them operators outside of SMT world instead of functions.

— Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/5#issuecomment-166886123.

cheshire commented 8 years ago

For true and false, maybe we should rename visitNumeral to visitConstant? In this case I would strongly suggest to change the String argument to Object, with a clearly defined list of possible types including Boolean and Number.

OK I'm not against it. Do you want to do the change? :P It's a good point that we can have all kinds of constants, e.g. strings.

cheshire commented 8 years ago

On FormulaVisitor: is there any reason not to give the input formula to the client? It seems weird they have to re-create it.

PhilippWendler commented 8 years ago

On FormulaVisitor: is there any reason not to give the input formula to the client? It seems weird they have to re-create it.

Yes, I think if we can manage to do this in a sane way, this is a good idea and we should do this.

The main arguments against this are "yet another parameter for visit*" and the potential of confusion for implementors which argument is which, especially for methods which already have an argument of type Formula (like BooleanFormulaManager.visitNot).

So we should decide on a fixed convention (formula instance always first, or always last), and name the parameters nicely, such that IDE templating for overriding methods will pick up meaningful names that avoid confusion. Any preferences on this?

cheshire commented 8 years ago

But I was talking about FormulaVisitor, not BooleanFormulaVisitor. It does not have any methods expecting a Formula parameter.

On Wed, Dec 23, 2015 at 6:16 PM, Philipp Wendler notifications@github.com wrote:

On FormulaVisitor: is there any reason not to give the input formula to the client? It seems weird they have to re-create it.

Yes, I think if we can manage to do this in a sane way, this is a good idea and we should do this.

The main arguments against this are "yet another parameter for visit*" and the potential of confusion for implementors which argument is which, especially for methods which already have an argument of type Formula (like BooleanFormulaManager.visitNot).

So we should decide on a fixed convention (formula instance always first, or always last), and name the parameters nicely, such that IDE templating for overriding methods will pick up meaningful names that avoid confusion. Any preferences on this?

— Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/5#issuecomment-166947655.

PhilippWendler commented 8 years ago

I thought we would to this for both visitors, if we do it?

For FormulaVisitor, there are the visit methods for quantifiers, which also have a Formula parameter.

cheshire commented 8 years ago

Yes, I think if we can manage to do this in a sane way, this is a good idea and we should do this.

Another problem is that each extra argument adds to the boilerplate the client has to write in order to use the visitor. One possible approach would be to just give clients the formula object, and then add getters Formula -> name and Formula -> FormulaType, but you really dislike the getName method.

cheshire commented 8 years ago

If we pass in a formula, could we drop the UfDeclaration? And re-derive at callsite if necessary?

cheshire commented 8 years ago

In the spirit of cutting a required amount boilerplate, what about merging visitExists and visitForall with an extra enum value passed?

cheshire commented 8 years ago

Any remaining issues?

PhilippWendler commented 8 years ago

Well, yes, at least the ones listed here: https://github.com/sosy-lab/java-smt/issues/5#issuecomment-165364668

Also visitQuantifier in BooleanFormulaVisitor is inconsistent with the one in FormulaVisitor. I don't understandy why newBodyConstructor is necessary in the latter, if BooleanFormulaTransformationVisitor can do without.

What about providing FormulaTransformationVisitor?

I suggest to drop the existing Recursive*Visitors and let callers use (Boolean)FormulaManager.visitRecursively().

Should BooleanFormulaVisitor.visit methods also get the input formula itself? Disadvantage is the confusing parameter list, advantage is that it makes DefaultBooleanFormulaVisitor.visitDefault() more useful if it gets passed this value (like in 4c3ff86).

We could have methods visitForAll, visitExists, visitUF, and (maybe) visitOperator in Default*FormulaVisitor and let the existing methods delegate to these. This would allow users to choose which method to override, e.g., they can override visitUF if they only want to handle UFs and other function applications get handled by visitDefault automatically. They could still override the current methods, too, so no additional code is forced onto them. I think this would be nice, what do you think?

cheshire commented 8 years ago

Partial reply so far.

I think this would be nice, what do you think?

I really like the code much more after the number of the methods got severely dropped, and I would like to keep it low. The commit removing the method visitUF outlines the problems with it.

PhilippWendler commented 8 years ago

I think this would be nice, what do you think?

I really like the code much more after the number of the methods got severely dropped, and I would like to keep it low.

I preferred it the way before. A separate method for each separate method is better readable code. You also know the general reasons against boolean parameters.

But anyway, my proposal would not force any more methods onto client code.

The commit removing the method visitUF outlines the problems with it.

I don't see any problems solved by a9cda41e08ab835d1482418504311fa7d5f383f5?

cheshire commented 8 years ago

Re #5(comment): There are two problems with supporting boolean variables:

I know issues with boolean parameters, but since now it is our code calling the client code, it's not such a problem (as on the client side they will always see the parameter name, the problem is usually is that the call-site is not terribly readable).

Thus to solve both of those problems I would prefer visitAtom(isVariable).

Regarding xor: but we don't even support creating one! Thus I am against it. (we can make a post-1.0 ticket for full XOR support, as we really can't add more features now).

Same goes for the distinct operation.

About Z3_OP_OEQ: what about it? According to the documentation it only occurs in proofs, and we do not support them so far.

cheshire commented 8 years ago

Also I the visitor entry point API is quite confusing right now. Adding visit methods to FormulaManager was meant to reduce the number of entry point to two: visit and visitRecursively. But now we still have visitors which implement their own visit method! Should they be removed and/or rewritten?

PhilippWendler commented 8 years ago

Thus to solve both of those problems I would prefer visitAtom(isVariable).

I do not think that the amount of visitor methods that we have currently is a problem at all. Users can still use DefaultFormulaVisitor if they wish or make delegation calls in their own visitor.

For visitAtom(isVariable), how helpful is this if you are not given the name of the variable?

Regarding xor: but we don't even support creating one!

First, this is wrong. Second, solvers may still add it to formulas on their own, thus it may exist in formulas that client code wishes to traverse over. What should happen in this case? UnsupportedOperationException? This also holds for distinct.

About Z3_OP_OEQ: what about it? According to the documentation it only occurs in proofs, and we do not support them so far.

Great, then this should be checked and an UnsupportedOperationException should be thrown.

cheshire commented 8 years ago

I do not think that the amount of visitor methods that we have currently is a problem at all

It is a problem, as DefaultFormulaVisitor loses some safety properties of non-default ones (a guarantee that the client knows about all possible cases).

Great, then this should be checked and an UnsupportedOperationException should be thrown.

But why? Z3_decl_kind has a lot of different values which can never occur in an AST, should we check for all of them?

First, this is wrong.

OK, my mistake, let's have visitXor if we support those.

For visitAtom(isVariable), how helpful is this if you are not given the name of the variable?

OK let's add visitVariable

What should happen in this case?

visitAtom should happen. No, I don't find it very counter-intuitive, as in the presence of integers it is already possible to encode binary predicates within atoms. E.g. (xor A B) can be already encoded as (= (+ (to_int A) (to_int B)) 1).

cheshire commented 8 years ago

Also we need to fix the quantifier visitor support.

The reason I have removed the bound variables is because Princess does not (directly) expose them. However, turns out they are required by Z3 for quantifier instantiation, hence I will be re-adding them.

For Princess we would have to go in a roundabout way. Since every quantifier binds exactly one variable, we would have to instantiate a visitor inside the visitor to go down and find all bound variables (using the de-Bruijn index corresponding to the current quantifier). Getting the current index is a bit complicated, as we need the topological traversal property to increment a current index on each visitQuantifier.

cheshire commented 8 years ago

Also: what about visitBoundVar for boolean visitors? It should be different from an unbounded variable.

The collision in options between boolean and non-boolean starts to become quite annoying.

cheshire commented 8 years ago

And there is a whole problem of exposing standardized names in FormulaVisitor#visitFunction. We need this at least to recognize non-boolean ITEs, which we've used to do with old introspection API.

This is very tricky as we want to:

Thus we can't simply return an Enum. One option is to return a String, but also provide a String -> Enum interface which would recognize most commonly used functions?

PhilippWendler commented 8 years ago

Also: what about visitBoundVar for boolean visitors? It should be different from an unbounded variable.

Bound boolean variable? Do solvers even allow this?

The collision in options between boolean and non-boolean starts to become quite annoying.

I don't think it's annoying, but if you do, what do you suggest doing against this?

cheshire commented 8 years ago

Bound boolean variable? Do solvers even allow this?

Of course, at least Z3 does. SMT > SAT, and in SAT quantifiers are always over boolean variables.

I don't think it's annoying, but if you do, what do you suggest doing against this?

I have found having to synchronize the interface between the two to be quite error-prone. If we are able to express declaration kind in a meaningful way (see the previous comment), we can make a BooleanFormulaVisitor to be an abstract implementation of FormulaVisitor.

PhilippWendler commented 8 years ago

Great, then this should be checked and an UnsupportedOperationException should be thrown.

But why? Z3_decl_kind has a lot of different values which can never occur in an AST, should we check for all of them?

Yes? There have been many many cases where we made assumptions about what solvers return and we turned out to be wrong, so this clearly seems like a place where defensive checks are valuable.

What should happen in this case?

visitAtom should happen. No, I don't find it very counter-intuitive, as in the presence of integers it is already possible to encode binary predicates within atoms. E.g. (xor A B) can be already encoded as (= (+ (to_int A) (to_int B)) 1).

So you don't find it counter-intuitive that you cannot traverse some boolean operators and boolean subtrees of formulas with BooleanFormulaVisitor? The fact that many SMT theories can be used to simulate boolean operators is irrelevant here I think.

cheshire commented 8 years ago

Yes? There have been many many cases where we made assumptions about what solvers return and we turned out to be wrong, so this clearly seems like a place where defensive checks are valuable.

You really don't want to do that, as there are _alot of things which can never occur in AST. In this case we have tested that, AND I am looking at Z3 source code which says that this value only occurs in proof terms.

So you don't find it counter-intuitive that you cannot traverse some boolean operators and boolean subtrees of formulas with BooleanFormulaVisitor? The fact that many SMT theories can be used to simulate boolean operators is irrelevant here I think.

The way I think about it is this: atoms in SMT are not really "atomic". When we are doing something only over the boolean part we are doing some over-approximation already. By supporting more operators we can make the over-approximation less coarse, but it is always an over-approximation. There are no invariants which can be violated by calling visitAtom too soon.

cheshire commented 8 years ago

Another important point to consider and test: what about unary minus? Do we treat is the same way as a binary minus? What about negative constants? Do we want visitConstant on them?

cheshire commented 8 years ago

Now that I have started working on #23, code duplication for different formula visitors becomes even larger problem. One should definitely extend the other in some way.

cheshire commented 8 years ago

Regarding code duplication: we can have a private redirection visitor, which would use the FormulaVisitor visit method to redirect the call to the client-supplied BooleanFormulaVisitor. That can be moved back to AbstractBooleanFormulaManager`.

visitBoolVar is a problematic method as then visitAtom becomes a bad name (because, well, variable is also an atom).

I also think we should rename visitFunction to visitFuncAppl as it's the function application, not the function itself which we are visiting.

cheshire commented 8 years ago

On visitBoolVar: We can supply a Declaration instance from #23 to visitAtom, that would allow the client to get the variable name, should they desire.

cheshire commented 8 years ago

Code duplication is solved in https://github.com/sosy-lab/java-smt/commit/0e99dd8aed5ee31200a3019a4bf2cc9caabf284

I have also removed visitBoolVar in favor of using visitAtom and reading the attached FuncDecl, and added visitBoundVar.

cheshire commented 8 years ago

I am currently satisfied with the current visitor state, I think the only feature which we have disagreed upon is handling the "distinct" case separately in boolean formula visitor, and i think we can agree it's not the killer feature (and arguments to "distinct" are not all boolean anyway). I am closing this ticket now, the remaining item is adding visitors for formula transformation, but that could be done separately.