sosy-lab / java-smt

JavaSMT - Unified Java API for SMT solvers.
Apache License 2.0
179 stars 44 forks source link

getSeqInterpolants API change proposal #93

Closed cheshire closed 5 years ago

cheshire commented 7 years ago

Hi,

I am not sure who wrote this API originally, probably @PhilippWendler .

The current signature for getting sequential interpolants is List<BooleanFormula> getSeqInterpolants(List<Set<T>> partitionedFormulas). I think wrapping the inner T inside set is confusing and unnecessary, and accepting List<T> is considerably more natural. Moreover, all the current users create those sets artificially just before calling this method.

I would propose to add a new method (unfortunately, can't use the same name due to erasure), and maybe mark the previous one for deprecation (since we're not ready to leave 1.0 yet...). Suggestions?

cheshire commented 7 years ago

Moreover, it doesn't seem like it is possible to use this API in a typesafe way, since SolverContext returns InterpolatingProverEnvironment<?>, while we need to give type T to the getSeqInterpolants function. All usages I have found use unsafe type casting.

PhilippWendler commented 7 years ago

It wasn't me who wrote this API, I think it was @kfriedberger.

Originally it was List<T> but it got changed when the interface got more features (tree interpolation). Probably we can't remove it, but we can add a convenience method.

It is indeed possible to use this API in a type-safe way, unsafe casts are not needed.

Have a method like

<T> void doSomething(InterpolatingProverEnvironment<T> ipe) { ipe.... }

and somewhere else call it:

try (InterpolatingProverEnvironment<?> ipe = context.getInterpolatingProverEnvironment()) {
  doSomething(ipe);
}

Or pass the environment to a constructor and store it in a field in a similar way.

kfriedberger commented 7 years ago

Hi. I was writing this code some time ago. The intention of Set was that multiple appearances of equal formulas are irrelevant and can/should be avoided. The same idea was also used for the tree-interpolation signature. Possible inner types might be Set or List or even Collection. That can be changed, if needed.

The type T is used to identify the interpolating formulas. We need some type there. The type is irrelevant for most callers, thus it was always casted from ?.

cheshire commented 7 years ago

@PhilippWendler thanks for the pointer, I did not know that.

The type is irrelevant for most callers, thus it was always casted from ?.

Of course, but doing an unsafe cast and then having to suppress a warning is ugly.

However, if our own tool uses the API in a suboptimal way (with a cast), and I did not know that it is possible to use it without a cast, I think it is safe to say that it is quite confusing. Maybe we should have it return a Handle interface instead?

The intention of Set was that multiple appearances of equal formulas are irrelevant and can/should be avoided

Sure, I understand, but I think it gets into a territory of an API which tries to be too clever for it's own good, especially given the fact that in our own usages singleton sets are manually constructed. I would prefer to change it, but now we have to keep backwards compatibility until the next release, so possibly would have to add a new function with a similar name.

PhilippWendler commented 7 years ago

However, if our own tool uses the API in a suboptimal way (with a cast), and I did not know that it is possible to use it without a cast, I think it is safe to say that it is quite confusing. Maybe we should have it return a Handle interface instead?

It may be confusing, but it is the only type-safe way that involves no casts at all. A Handle class would mean that you need checked casts, and that you could confuse handles that belong to different environments. The generic parameter prevents this nicely. Maybe we should fix our own tool and add documentation instead?

The intention of Set was that multiple appearances of equal formulas are irrelevant and can/should be avoided

Sure, I understand, but I think it gets into a territory of an API which tries to be too clever for it's own good, especially given the fact that in our own usages singleton sets are manually constructed. I would prefer to change it, but now we have to keep backwards compatibility until the next release, so possibly would have to add a new function with a similar name.

I think there is a misunderstanding between you too here. AFAIK @cheshire is talking about List<Set<T>> vs. List<T> and @kfriedberger is talking about List<Set<T>> vs. List<Collection<T>>. I understand that List<T> provides less functionality then the other solutions and thus can't replace the existing method. Changing List<Set<T>> to List<Collection<T>> would make sense but has very little benefit and is not worth the incompatibility. There only remains the question whether we want to add a method with List<T> as utility that delegates to List<Set<T>>.

kfriedberger commented 7 years ago

Hm, another try to explain the nested structure: We need the nested List<Set<T>> or List<Collection<T>>, because interpolation queries are in this exact form: (get-interpolant (A B) (C D E)) is identical to (get-interpolant (and A B) (and C D E)), where A,B,C,D,E are formula identifiers of type T. The nested structure avoids the conjunction that would need additional formula identifiers and causes additional overhead for the SMT solver. Depending on the underlying solver, we cannot even build conjunctions of identifiers, but would have to map the identifier back to its formula first. There could even be more conjunctions for sequence interpolants or tree interpolants.

cheshire commented 7 years ago

Maybe we should fix our own tool and add documentation instead?

Probably, I don't have a better solution on top of my head. To me it does seem like we are being overly clever with generics.

I understand that List provides less functionality then the other solutions and thus can't replace the existing method

As far as I currently understand it, List<Set<T>> is just a syntactic "sugar" for avoiding building the conjunction. I use quotes because in all our current usages those sets are singletons, and are just extra overhead.

The nested structure avoids the conjunction that would need additional formula identifiers

I'm not quite sure what do you mean here.

and causes additional overhead for the SMT solver

Do we have hard data for this? Also please note that in practice we never actually use the inner set for the sequential interpolation.

Depending on the underlying solver, we cannot even build conjunctions of identifiers

Why not?

Also just noticed another issue with the current API: it conflates the "push" semantics with the semantics of marking formula boundaries for interpolation. This makes it difficult to perform push/pop when they are not directly correlated with the interpolation boundaries.

kfriedberger commented 7 years ago

As far as I currently understand it, List<Set> is just a syntactic "sugar" for avoiding building the conjunction. I use quotes because in all our current usages those sets are singletons, and are just extra overhead.

Correct, instead of a Set<T> of formulas f we could use one formula of type T. There is currently no use-case for the set-based solution, thus we use singleton-sets in all callers.

Also please note that in practice we never actually use the inner set for the sequential interpolation.

There might be other users of our API and perhaps they use it.

The basic reason for the set-based solution is the interface for Princess, because it exactly matches Seq<Set<Object>> (Scala) and List<Set<Term>> (Java).

In my opinion we should keep it as is, because there is no benefit of changing it.


The semantics for "addConstraint" comes from Solvers like SMTInterpol and Princess, where the identifier must be set before adding the assertion onto the solver's stack. As the default interpolation method getInterpolant(List) returns an interpolant between the given list of formulas and the rest of the formulas on the stack, it makes sense to directly return the identifier when adding a new formula.

PhilippWendler commented 7 years ago

Btw., why is this method not supported for all solvers according to the Javadoc? This is regular interpolation that every solver supports, nothing fancy like tree interpolation, right?

It is quite ugly to have a method that is recommended for some solvers but unsupported for others.

kfriedberger commented 7 years ago

@PhilippWendler There is no documentation about any solver whether repeatedly calling getInterpolant(lst1,lst2) with a growing lst1 and a shrinking lst2 with the same internal proof returns a valid sequence of sequence interpolants. Thus only the minimal guaranteed features were implemented, and this includes seqInterpolants(lst) only for solvers with direct support for this.

We had the same rule about theories and e.g. division, where we decided that whenever something can not be implemented into JavaSMT in a way that it is guaranteed, we throw an UnsupportedOperationException.

There are different interpolation strategies in CPAchecker that use these different interpolation approaches. Maybe someone might want to merge the interpolation strategies from CPAchecker into JavaSMT :-)

PhilippWendler commented 7 years ago

@kfriedberger You are right that we should implement this method only if we can guarantee it works correctly.

However, I would argue that we can do so. All of CPAchecker's interpolation-based analyses absolutely rely on exactly this assumption, since forever. The original implementation of the MathSAT integration was written by Alberto, who certainly would not have relied on this assumption if MathSAT wouldn't guarantee it. In fact, Alberto also added code relying on this assumption to pysmt. For the other SMT solvers it is certainly the same. If you are unsure you can ask their developers for an official statement.

In fact, I noticed just now that there is a comment on getInterpolant that explicitly does not guarantee an inductive sequence of interpolants. So now I as a CPAchecker developer have a feature request for JavaSMT to add this guarantee, because we absolutely need it.

PhilippWendler commented 7 years ago

So, to restate:

cheshire commented 7 years ago

@PhilippWendler I'm currently completely blocked by finishing the thesis, and will get to that next month.

Personally, I am rather unsatisfied with both issues mentioned above:

Thus I would like to change that, even at the cost of b/w incompatible change.

@PhilippWendler if we add the utility method due to type erasure it couldn't have the same name. And I would prefer not to have needless overhead.

@kfriedberger

There is no documentation about any solver whether repeatedly calling getInterpolant(lst1,lst2) with a growing lst1 and a shrinking lst2 with the same internal proof returns a valid sequence of sequence interpolants

Maybe can you get a short example demonstrating such a sequence? I might be missing something obvious, but I have hard time thinking of an example.

PhilippWendler commented 7 years ago

For the handles, I can follow your argument. However, I would not make a backwards-incompatible change for such a relatively unimportant reason quite soon after version 1.0. Wouldn't this send the signal that the API is not really stable and we just keep changing things as we did before 1.0?

For getSeqInterpolants, I would not remove the existing method, because the replacement provides less possibilities, and some people might need this flexibility. Also, again this would be a change that I would not consider worthy for a backwards incompatible version.

I would make such small backwards-incompatible changes only in batches when enough has accumulated that warrants a new major version (or better yet, add the new methods and keep the old as @Deprecated).

For the incremental interpolation feature, we can trivially construct examples that are not inductive sequences of interpolants, I am just arguing that solvers actually guarantee inductiveness.

cheshire commented 7 years ago

@PhilippWendler OK so how should b/w incompatible changes be handled? Put them on a branch, wait till "enough" accumulates (for some definition of "enough") and then make a new major release?

because the replacement provides less possibilities

I don't follow that, which ones? If the user can give a set, they can construct a conjunction.

we can trivially construct examples

OK, could you give one for completeness?

kfriedberger commented 7 years ago

@cheshire

There is no documentation about any solver whether repeatedly calling getInterpolant(lst1,lst2) with a growing lst1 and a shrinking lst2 with the same internal proof returns a valid sequence of sequence interpolants

Maybe can you get a short example demonstrating such a sequence? I might be missing something obvious, but I have hard time thinking of an example.

An example for this is CPAchecker with PredicateAnalysis. For a list of successive pathformulas [pf1,pf2,pf3,pf4], the interpolants are computed via the queries getItp([pf1],[pf2,pf3,pf4]), getItp([pf1,pf2],[pf3,pf4]), getItp([pf1,pf2,pf3],[pf4]), all queries are computed from the same proof. That seems to work and even produce a valid sequence of interpolants. The result seems to be identical to the list of interpolants returned from one single query getSeqItp({pf1},{pf2},{pf3},{pf4}), however I did not find any documentation or proof for this 'coincidence'.

cheshire commented 7 years ago

@kfriedberger I've meant specifically an example where the assumption you were talking about is violated: the solver "does not return a valid sequence of sequence of interpolants". Just write an artificial one.

kfriedberger commented 7 years ago

ok, hypothetical example / not executed / just plain theory: List of formulas: [false,true,false] getItp([false],[true,false]) -> TRUE (valid interpolant) getItp([false,true],[false]) -> FALSE (valid interpolant) However, the sequence of interpolants is invalid: TRUE && true => FALSE is incorrect. The corresponding rule for sequences is similar to: (ITP_i && PF_i) => ITP_(i+1)

PhilippWendler commented 7 years ago

I did not find any documentation or proof for this 'coincidence'.

Above I have argued that this is not a "coincidence".

PhilippWendler commented 7 years ago

@PhilippWendler OK so how should b/w incompatible changes be handled? Put them on a branch, wait till "enough" accumulates (for some definition of "enough") and then make a new major release?

Not sure. Best would be If we can add the new API now, mark the old as @Deprecated, and remove it later.

because the replacement provides less possibilities

I don't follow that, which ones? If the user can give a set, they can construct a conjunction.

No. The user needs to pass what you call a handle, not a formula. So they can't pass a conjunction of handles.

cheshire commented 7 years ago

@PhilippWendler Indeed it's quite interesting that the commit you have mentioned in PySMT is working. Might be a hidden solver feature? I'll ask the developers. In any case, it's possible to do sequence interpolation "properly" using binary interpolation: one just has to use the previously obtained result in the new interpolation query:

I_0 = \top
I_1 = interpolate(I_0 /\ T_1, T_2 /\ ... T_n)
...
etc
cheshire commented 7 years ago

Additionally, I find the getInterpolant method a bit awkward, as it seems to favor MathSAT solver all others by directly adapting to its API, and forcing all other solvers to keep a datastructure holding all asserted formulas (and the whole assertion stack as well!) and then awkwardly calculating the difference between "current" formulas, and formulas passed as an argument. And if we add a second method for interpolation which only accepts List<InterpolationHandle>, then it becomes redundant as well, as it is trivial to construct a list of two arguments. @kfriedberger if we were to go with b/w incompatible change, thoughts on removing getInterpolant? That would simplify the code considerably, and reduce the overhead caused by storing the formula stack.

kfriedberger commented 7 years ago

I would also prefer the cleaner version with (only) the methods getSeqInterpolants and getTreeInterpolants. The changes needed in CPAchecker are surely not backwards-compatible, but can be seen as a 'one-time-cost' for supporting a 'new' API. The cleaner version of the API makes it easier for users to understand and we should not only depend on CPAchecker developers as potential users. Let's drop the method getInterpolant and only provide getSeqInterpolants and getTreeInterpolants.

kfriedberger commented 7 years ago

Hm, thinking about it. How can we guarantee/enforce that the lists given to getSeqInterpolants contains all pushed formulas? What happens otherwise? Interpolation with assumptions? This might be a case where the old method getInterpolant is better, because the formulas from the parameter are one part and the rest of the pushed formulas are the other part.

cheshire commented 7 years ago

I think (but would need to check with tests) that for all non-MathSAT solvers the API is effectively stateless, and only operates on formulas given as a query. To me that seems more intuitive than thinking about the solver state. If it's not a problem for all other solvers, I don't think it is a problem for us, but I can check how easy is changing the CPAchecker code before moving the change to the master branch.

kfriedberger commented 7 years ago

The API is stateful, because the pushed formulas need to be UNSAT, and the proof (UNSAT-core or similar) is kept as 'state'. It does not make sense to ask a solver for an interpolant for a subset of formulas, but it is possible and I do not know the intention for this.

cheshire commented 7 years ago

What happens otherwise? Interpolation with assumptions?

Don't we also have the same problem now with sequence/tree interpolation API? Let me get back to you on this once I tests things more exhaustively. The meta-argument on why this should work is that it works just fine for all non-MathSAT solvers.

cheshire commented 7 years ago

@PhilippWendler Regarding sequence interpolants: I have contacted Alberto, and the assumption used in PySMT indeed always holds for MathSAT. So we can have sequence interpolants in MathSAT.

cheshire commented 7 years ago

What happens otherwise? Interpolation with assumptions?

@kfriedberger what solvers do based on my tests is that constraints added to the solver, but not marked to the interpolation are implicitly added to the left-most (or right-most, does not make a difference) side of the sequence query. I suppose it's even more convenient, since using that approach the previous API can be emulated even easier.

Though when rewriting old usage using new API I have found that generally the code seems more readable if implicit statefulness is eliminated, and the caller explicitly specifies groups between which the interpolation is performed.

cheshire commented 7 years ago

Actually, at least for Z3 the definition is slightly different, as "background" formulas are always allowed in the interpolant:

        Any premises of the proof not present in the pattern are
        treated as "background theory". Predicate and function symbols
        occurring in the background theory are treated as interpreted and
        thus always allowed in the interpolant.
kfriedberger commented 6 years ago

partially done in f2dc27be8204b8cb57e90b5675b852ec3e2b7690

kfriedberger commented 5 years ago

Just like Z3, SMTInterpol uses all unused formulas as background theory and their symbols can appear in all inteprolants (see https://github.com/ultimate-pa/smtinterpol/issues/38#issuecomment-423973123 ).

kfriedberger commented 5 years ago

We can close this issue now, because the InterpolationProver allows to use List<T> as input as well as List<Collection<T>>, which was the original intent here.