sosy-lab / java-smt

JavaSMT - Unified Java API for SMT solvers.
Apache License 2.0
186 stars 46 forks source link

Add Visitor for Formula Traversal and Transformation #5

Closed PhilippWendler closed 8 years ago

PhilippWendler commented 8 years ago

These exist already in CPAchecker and should be moved to JavaSMT.

PhilippWendler commented 8 years ago

The visitor that exist in CPAchecker do not handle quantifiers, this needs to be added.

cheshire commented 8 years ago

Also would it be hard to change the constructor of BooleanFormulaTransformationManager to require a BooleanFormulaManager instead of FormulaManager? Latter offers no advantages, and with that the class can not be called from BooleanFormulaManager (as it does not have FormulaManager instance handy)

kfriedberger commented 8 years ago

For handling Quantifiers, we need a QuantifierFormulaManager, and instead of using two parameters BooleanFormulaManager and QuantifierFormulaManager, we can access both via FormulaManager. I prefer keeping the whole FormulaManager.

cheshire commented 8 years ago

Ah true, that's a good point. Should the traversal expose quantified formulas somehow then? E.g. visitExists(bound, formula) EDIT: disregard, saw Philipp's comment above.

cheshire commented 8 years ago

Question: why does the comment say that no guarantees are made on the iteration order? Surely it just traverses the whole formula using DFS?

kfriedberger commented 8 years ago

We have Visitors, that cache results. Additionally, we have no guaranteed order for the operands of AND(op1,op2,op3), as the solver can simplify the term if one of the args is true.

So we can not guarantee DFS.

cheshire commented 8 years ago

I think it's partly a terminology issue. Maybe "topological" order would be better?

that cache results

That doesn't change the traversal order, they just have a very short traversal: check cache, return.

we have no guaranteed order for the operands of AND

I would still call it DFS for any traversal order of AND- or OR- arguments.

In the end of the day, at least to me, an "arbitrary order" means that the traversal can pick up any argument at any point from the "worklist". This is an important difference, as in our case if the solver starts processing the node, it will finish processing its children before jumping to other nodes. This allows state-full methods (keeping traversal state in class fields) In general case, no such guarantee is possible.

PhilippWendler commented 8 years ago

For the portfolio solver (#8), even a guarantee for a topological iteration would be problematic, though.

I would think that a stateful visitor that relies on the iteration order should be avoided if possible anyway, as it seems very error prone. I don't think we even need it in CPAchecker, or do we? Visitors that extract things like variables and atoms are stateful but do not rely on iteration order, and all kinds of transformation like renaming variable are handled by the BooleanFormulaTransformationVisitor, which is also safe.

cheshire commented 8 years ago

even a guarantee for a topological iteration would be problematic, though.

Would it? Just apply it to all formulas one by one.

cheshire commented 8 years ago

We need to stop code duplication with CPAchecker visitors... One problem is that JavaSMT visitors require a FormulaManager, while CPAchecker ones take in a FormulaManagerView.

We can do the following:

PhilippWendler commented 8 years ago

We should not expose UnsafeFormulaManager from FormulaManagerView, otherwise people will start using it.

Why are the create*Environment methods related to this?

PhilippWendler commented 8 years ago

The following cases should be handled in the visitors:

cheshire commented 8 years ago

@PhilippWendler

Otherwise we are stuck with code duplication in visitors (the ones in CPAchecker want FormulaManagerView and the ones in JavaSMT want FormulaManager).

cheshire commented 8 years ago

There are a few usages of UnsafeFormulaManager and raw introspection methods in CPAchecker code, and I don't think they can be changed to visitors nicely.

    if (bfmgr.isAnd(constraint)) {
      for (int k = 0; k < ufmgr.getArity(constraint); k++) {
        addConstraint((BooleanFormula)ufmgr.getArg(constraint, k),
            prover, ufmgr);
PhilippWendler commented 8 years ago

In the Solver class there is a helper method getUnsatCore I have used many times; It finds an UNSAT core, but first it tries to "unravel" the formula as much as it can -- eg. AND(AND(A, B), AND(C, D)) gets turned into AND(A, B, C, D) and the unsat core is searched among the subsets of A, B, C, D. Is visitor suitable here?

Should be doable with a visitor that collects or pushes all non-And terms. Probably this code should even be moved to JavaSMT.

In my policy iteration code I find UFs and manually perform Ackermannization on them --- how that can be done without isUF calls?

Ackermannization should be moved to JavaSMT as well? BooleanFormulaVisitor does not replace isUF, but a new FormulaVisitor with methods for UFs, variables, and quantifiers, should replace it.

cheshire commented 8 years ago

BTW why do visitX methods are public? Shouldn't they be protected?

kfriedberger commented 8 years ago

Yeah, protected would be nice, but then we cannot access them from BooleanFormulaManager, as it is another package.

cheshire commented 8 years ago

@kfriedberger yes you're right, at least visitDefault can be protected though.

@PhilippWendler

Probably this code should even be moved to JavaSMT.

It's not feasible to add as a tactic every possible transformation we can think of. And users will think of many others.

PhilippWendler commented 8 years ago

It's not feasible to add as a tactic every possible transformation we can think of. And users will think of many others.

I agree, though I would think that not this tactic but this getUnsatCore method will be helpful and generally useful.

For visitors which are allowed to go inside boolean atoms, would we supply the client with any way to know what operation this is? This would be required to implement things like splitEqualityIfPossible. But on the other hand the list of all possible operations is huge and not even well-defined.

That's the big open question.

Probably the list of operators is well defined (by SMTLIB), but still too large to be useful. One option would be to have a visitOperator(String name, List<Formula> operands) method (with name being the SMTLIB-compliant name). I think it would be good if we can distinguish between operators and UFs, but syntactically they look the same and I do not know whether every solver supports distinguishing between them without hard-coding a list of all operators.

Thus my current preference for this visitor interface looks like this:

public interface FormulaVisitor<R> {
  R visitFreeVariable(String name, FormulaType type);
  R visitBoundVariable(String name, FormulaType type);
  R visitLiteral(??? literal, FormulaType type);
  R visitUF(String name, List<Formula> args, FormulaType type);
  R visitOperator(String name, List<Formula> args, FormulaType type);
  R visitForallQuantifier(List<Formula> variables, BooleanFormula body);
  R visitExistsQuantifier(List<Formula> variables, BooleanFormula body);
}

This should allow to do everything the current API allows, right?

Maybe there is a valid reason to add <T extends Formula> visitIfThenElse(BooleanFormula cond, T thenBranch, T elseBranch);.

For recreating the same formula while traversing one would have to add a corresponding "makeOperator(String name, ...)" method though.

Suggestions for a better interface are welcome.

cheshire commented 8 years ago

Probably the list of operators is well defined (by SMT-LIB)

I think the problem is that solvers have their own extensions. Check out http://smtlib.cs.uiowa.edu/theories.shtml. It doesn't have strings or datastructures, but Z3 has both of them (though we don't support those). The question is whether Java-SMT should support only a subset of SMT-LIB, but e.g. we support interpolation, which isn't actually defined by SMT-LIB.

I think it would be good if we can distinguish between operators and UFs

Should we? For practical purposes, what is the difference between an operator you don't know and a UF? Applying an operator creates a stronger formula, but it has all the properties UF has. And I think the name is "function application" and not an "operator".

visitLiteral

Do you mean things like numerals, eg "42"?

visitOperator

I would prefer visitFunction

visitBoundVariable

Maybe should provide a pointer to the binding quantifier? Or some way to get that access?

visitOperator(String name, List args, FormulaType type)

How do we make sure names are consistent across different solvers? I think typecasting between different theories can become a pain point (all the integer-to-rational functions).

visitFreeVariable

On one hand this shortcut makes sense, but on the other hand a variable is just a function with no arguments.

PhilippWendler commented 8 years ago

Yes, with literal I mean what you call numeral. Not sure what is more common, in programming context I think it is literal. MathSAT just calls this "number".

visitFreeVariable could maybe be useful, I guess many implementors will do something different than for UFs. The default visitor could map this call to visitUF. It is also nice for symmetry to visitBoundVariable.

For visitOperator name consistency, we cannot do better than SMTLib I guess. This is essentially a replacement for UnsafeFormulaManager.getName(), which suffers from the same problem. I hope people will not use this method and actually check the given name, but use it for more generic things like re-creating the same operator inside a FormulaTransformationVisitor. Or should we add explicit support for this use case instead?

For UFs, I think the distinction between operators and UFs can be quite convenient, we have a lot of callers to isUF I think. For example visitors that apply a renaming of UFs for example should never change operators, thus it can be useful to separate them.

Yes, function application is the correct term, but if we separate UFs and operators, we cannot use it.

cheshire commented 8 years ago

but use it for more generic things like re-creating the same operator inside a FormulaTransformationVisitor

How can one re-create an operator given only a String (its name)? For now I would just do an implementation which does the same as UnsafeFormulaManager#getName as we really can't do much better without investing too much time...

Yes, function application is the correct term, but if we separate UFs and operators, we cannot use it.

That's a bit pedantic, I think if there is a method visitUF it's obvious that "function" means "interpreted function".

cheshire commented 8 years ago

For a literal we can use a String serialization in place of ????. It is always fairly cheap to serialize bitvector/rational/floating point/integer to and from strings.

cheshire commented 8 years ago

Actually I've meant for the numeral :) I'm very much -1 on using the term "literal" as in logic context it has a well defined meaning: an atom or its negation.

PhilippWendler commented 8 years ago

How can one re-create an operator given only a String (its name)?

I mentioned that this probably needs to be added. I do not like it but I see no other way (we have it currently in UnsafeFormulaManager).

For now I would just do an implementation which does the same as UnsafeFormulaManager#getName as we really can't do much better without investing too much time...

How would you implement a visitor that for example renames all variables then? There needs to be a way to re-create the same higher-level AST node with changed arguments.

PhilippWendler commented 8 years ago

For a literal we can use a String serialization in place of ????.

String can be very inconvenient if you actually want to do something with this number. Types like bitvectors and floating point have complex number formats that cannot be trivially converted into something usable and users would need to write their own parsing for this.

cheshire commented 8 years ago

How would you implement a visitor that for example renames all variables then?

We can pass an extra argument Function<List<Formula>, Formula> which would create a function application from the provided arguments.

PhilippWendler commented 8 years ago

How would you implement a visitor that for example renames all variables then?

We can pass an extra argument Function<List<Formula>, Formula> which would create a function application from the provided arguments.

Nice idea.

cheshire commented 8 years ago

and users would need to write their own parsing for this

Not if we provide the corresponding classes with a convenient of(String s) method. Currently we can construct a floating point number from Rational/double/BigDecimal classes. I don't think that any of those have 1-1 mapping with SMT floats (floats represented in a given theory may be very different from the double in Java), and a custom class may be needed.

I think the same applies to bitvectors.

cheshire commented 8 years ago

Coming back to the parent's comment:

R visitExistsQuantifier(List variables, BooleanFormula body);

Why are these methods here? Shouldn't they be in BooleanFormulaVisitor and shouldn't a FormulaVisitor inherit from it? Otherwise, where are all the other methods for exploring the boolean structure of the formula?

cheshire commented 8 years ago

Nice idea.

That might make writing constructors even more verbose than they are currently are. Other approach would be providing constructors on request from function names through eg FormulaManager something like

public Function<List<Formula>, Formula> getConstructorForFunction(String functionName) {};

Thus my version currently is:


public abstract class FormulaVisitor<R> extends BooleanFormulaVisitor<R> {
  abstract R visitFreeVariable(String name, FormulaType type);
  abstract R visitBoundVariable(String name, FormulaType type, Formula boundingAST);
  abstract R visitNumeral(String numeral, FormulaType type);
  abstract R visitUF(String functionName, List<Formula> args, FormulaType type);
  abstract R visitFunctionApplication(
      String functionName,
      List<Formula> args,
      FormulaType type,
      Function<List<Formula>, Formula> newApplicationConstructor);
}

with constructors optionally moved to FormulaManager.

PhilippWendler commented 8 years ago

I do not understand what you mean by "constructor".

A FormulaManager.getConstructorForFunction method would be the same as a method that allows to create any function application by giving a string.

I did not intend to have the two visitors extend from each other. One allows to traverse the boolean part, with semantics of function applications, and the other allows to traverse all formulas, but without any semantics of function applications. This matches the use cases we have so far.

PhilippWendler commented 8 years ago

visitBoundVariable

Maybe should provide a pointer to the binding quantifier? Or some way to get that access?

I do not know, but this could be quite difficult to support for some solvers.

cheshire commented 8 years ago

I did not intend to have the two visitors extend from each other

If the visitor is only meant to be applied on non-boolean ASTs then it should not see any quantifiers.

But if we want it to apply it on any kind of ASTs I don't see any reason not to extend it from the boolean visitor. The fact that you had to duplicate the functions from it suggests that we should.

PhilippWendler commented 8 years ago

It is certainly intended to be applicable to all kinds of ASTs. I do not see any reason for inheritance. Why should the boolean functions be handled by different methods than the non-boolean one? Either I want to iterate syntactically over the AST without caring for what functions I see, then I do not need special methods, or I care about specific operators, then I cannot traverse non-boolean formulas. Quantifiers are different because they are syntactically different (they would not be handled by visitFunctionApplication).

Implementations of the visitor would be considerably easier as they have a lot less methods.

PhilippWendler commented 8 years ago

The existence of visitAtom in BooleanFormulaVisitor would be a problem if FormulaVisitor inherits from BooleanFormulaVisitor. What would its semantics be? When would it be called?

cheshire commented 8 years ago

OK point taken on inheritance.

Where from should this visitor be invoked? FormulaManager itself? I suppose not the UnsafeFormulaManager since that's what we are trying to hide.

cheshire commented 8 years ago

Then the interface becomes

public abstract class FormulaVisitor<R> {
  abstract R visitFreeVariable(String name, FormulaType type);
  abstract R visitBoundVariable(String name, FormulaType type, Formula boundingAST);
  abstract R visitNumeral(String numeral, FormulaType type);
  abstract R visitUF(String functionName, List<Formula> args, FormulaType type);
  abstract R visitFunctionApplication(
      String functionName,
      List<Formula> args,
      FormulaType type,
      Function<List<Formula>, Formula> newApplicationConstructor);
  abstract R visitForallQuantifier(List<Formula> variables, BooleanFormula body);
  abstract R visitExistsQuantifier(List<Formula> variables, BooleanFormula body);
}
PhilippWendler commented 8 years ago

The place of the visit method depends on #13 I guess.

The visitUF method should maybe get an UninterpretedFunctionDeclaration instance (with name field added to that class?)?

cheshire commented 8 years ago

I've hacked together a quick call to visitor from Z3 to get a feel for what the API would look like. I have noticed we are missing the renaming functionality, which probably can be integrated into the constructor.

The code looks like this (inside Z3UnsafeFormulaManager):

  public FormulaType getFormulaType(Long f) {
    // TODO
    // TODO: do we support the parametrization of FormulaType?
    return null;
  }

  public <R> R visit(FormulaVisitor<R> visitor, final Long f)  {
    String name = getName(f);
    final FormulaType type = getFormulaType(f);
    List<Formula> args = new ArrayList<>();
    for (int i=0; i<getArity(f); i++) {
      // TODO: arguments might have a different typ
      long arg = getArg(f, i);
      FormulaType argumentType = getFormulaType(arg);
      args.add(getFormulaCreator().encapsulate(argumentType, arg));
    }

    if (isFreeVariable(f)) {
      return visitor.visitFreeVariable(name, type);
    } else if (isBoundVariable(f)) {

      // TODO: is it feasible to get the bounding AST?
      return visitor.visitBoundVariable(name, type, null);
    } else if (isNumber(f)) {
      return visitor.visitNumeral(ast_to_string(z3context, f), type);
    } else if (isUF(f)) {
      return visitor.visitUF(name, args, type);
    } else if (get_ast_kind(z3context, f) == Z3_QUANTIFIER_AST) {

      // TODO: get bound arguments.

      if (is_quantifier_forall(z3context, f)) {
        return visitor.visitForAllQuantifier(args,
            getFormulaCreator().encapsulateBoolean(get_quantifier_body(z3context, f))
        );
      } else {
        return visitor.visitExistsQuantifier(args,
            getFormulaCreator().encapsulateBoolean(get_quantifier_body(z3context, f))
        );
      }
    } else {

      // Any function application.
      Function<List<Formula>, Formula> constructor = new Function<List<Formula>, Formula>() {
        @Override
        public Formula apply(List<Formula> formulas) {
          return replaceArgs(getFormulaCreator().encapsulate(type, f),
              formulas);
        }
      };

      return visitor.visitFunctionApplication(
          name, args, type, constructor);
    }
  }
cheshire commented 8 years ago

The visitUF method should maybe get an UninterpretedFunctionDeclaration instance

I don't know, TBH I have never used this class... I guess yes if it's easy to do so.

cheshire commented 8 years ago

Regarding the formula type: we don't have facilities to provide the formula type for all arguments of the function, without making the API too hard to use. Maybe we shouldn't provide FormulaType at all and instead provide a function Formula -> FormulaType?

PhilippWendler commented 8 years ago

Regarding the formula type: we don't have facilities to provide the formula type for all arguments of the function, without making the API too hard to use.

I do not understand? What is "the function"? What exactly is hard to use?

Maybe we shouldn't provide FormulaType at all and instead provide a function Formula -> FormulaType?

We have that function already, but inside the visitor one does not have the current formula instance.

cheshire commented 8 years ago

We have that function already, but inside the visitor one does not have the current formula instance.

Right!

Okay, here is an updated code:


  public <R> R visit(FormulaVisitor<R> visitor, final Long f)  {
    String name = getName(f);
    switch (get_ast_kind(z3context, f)) {
      case Z3_NUMERAL_AST:
        return visitor.visitNumeral(
            ast_to_string(z3context, f),
            getFormulaCreator().getFormulaType(f)
        );
      case Z3_APP_AST:
        final FormulaType type = getFormulaCreator().getFormulaType(f);
        List<Formula> args = new ArrayList<>();
        for (int i=0; i<getArity(f); i++) {
          long arg = getArg(f, i);
          FormulaType argumentType = getFormulaCreator().getFormulaType(arg);
          args.add(getFormulaCreator().encapsulate(argumentType, arg));
        }

        if (isUF(f)) {
          // Special casing for UFs.
          return visitor.visitUF(name, args, type);
        } else {
          // Any function application.
          Function<List<Formula>, Formula> constructor =
              new Function<List<Formula>, Formula>() {
            @Override
            public Formula apply(List<Formula> formulas) {
              return replaceArgs(getFormulaCreator().encapsulate(type, f),
                  formulas);
            }
          };
          return visitor.visitFunctionApplication(
              name, args, type, constructor);
        }
      case Z3_VAR_AST:
        return visitor.visitFreeVariable(name,
            getFormulaCreator().getFormulaType(f));
      case Z3_QUANTIFIER_AST:
        BooleanFormula body = getFormulaCreator().encapsulateBoolean(
                  get_quantifier_body(z3context, f));
        List<Formula> qargs = new ArrayList<>();
        for (int i=0; i<get_quantifier_num_bound(z3context, f); i++) {
          long arg = get_quantifier_pattern_ast(z3context, f, i);
          FormulaType argumentType = getFormulaCreator().getFormulaType(arg);
          qargs.add(getFormulaCreator().encapsulate(argumentType, arg));
        }

        if (is_quantifier_forall(z3context, f)) {
          return visitor.visitForAllQuantifier(qargs, body);
        } else {
          return visitor.visitExistsQuantifier(qargs, body);
        }

      case Z3_SORT_AST:
      case Z3_FUNC_DECL_AST:
      case Z3_UNKNOWN_AST:
      default:
        throw new UnsupportedOperationException("Input should be a formula AST, " +
            "got unexpected type instead");
    }
  }

I think I'll push it after I'll add some tests.

cheshire commented 8 years ago

Oh right, the renaming point:

cheshire commented 8 years ago

For a bound variable there definitely should be a way to identify a quantifier AST, otherwise it would not be possible to e.g. rename bound variables, as the quantifier AST which binds them is also affected by the rename.

cheshire commented 8 years ago

What about theory for arrays? Strings? Custom datatypes? Should we delegate all unknown cases to visitFunctionApplication?

cheshire commented 8 years ago

Also in the spirit of cutting the boilerplate, I would make the method names shorter.

cheshire commented 8 years ago

And yet another issue:

If we talk about FormulaTransformationVisitor how to handle classes which would return a different output for a quantified variable? They should be either untouched or renamed altogether.

cheshire commented 8 years ago

The following cases should be handled in the visitors

I would actually close the feature set on boolean visitors now. We don't need to handle every possible case. After all, SMT atoms are not really atomic, and all kinds of "logical" things can happen inside, and if we miss a few operators it is not a problem. Thus I'm against supporting XOR, distinct and boolean variables as special cases, as it leads to even more boilerplate.