Closed cheshire closed 5 years ago
Moreover, it doesn't seem like it is possible to use this API in a typesafe way,
since SolverContext
returns InterpolatingProverEnvironment<?>
, while we need to give type T
to the getSeqInterpolants
function.
All usages I have found use unsafe type casting.
It wasn't me who wrote this API, I think it was @kfriedberger.
Originally it was List<T>
but it got changed when the interface got more features (tree interpolation). Probably we can't remove it, but we can add a convenience method.
It is indeed possible to use this API in a type-safe way, unsafe casts are not needed.
Have a method like
<T> void doSomething(InterpolatingProverEnvironment<T> ipe) { ipe.... }
and somewhere else call it:
try (InterpolatingProverEnvironment<?> ipe = context.getInterpolatingProverEnvironment()) {
doSomething(ipe);
}
Or pass the environment to a constructor and store it in a field in a similar way.
Hi.
I was writing this code some time ago.
The intention of Set
was that multiple appearances of equal formulas are irrelevant and can/should be avoided. The same idea was also used for the tree-interpolation signature.
Possible inner types might be Set
or List
or even Collection
. That can be changed, if needed.
The type T
is used to identify the interpolating formulas. We need some type there. The type is irrelevant for most callers, thus it was always casted from ?
.
@PhilippWendler thanks for the pointer, I did not know that.
The type is irrelevant for most callers, thus it was always casted from ?.
Of course, but doing an unsafe cast and then having to suppress a warning is ugly.
However, if our own tool uses the API in a suboptimal way (with a cast), and I did not know that it is possible to use it without a cast, I think it is safe to say that it is quite confusing.
Maybe we should have it return a Handle
interface instead?
The intention of Set was that multiple appearances of equal formulas are irrelevant and can/should be avoided
Sure, I understand, but I think it gets into a territory of an API which tries to be too clever for it's own good, especially given the fact that in our own usages singleton sets are manually constructed. I would prefer to change it, but now we have to keep backwards compatibility until the next release, so possibly would have to add a new function with a similar name.
However, if our own tool uses the API in a suboptimal way (with a cast), and I did not know that it is possible to use it without a cast, I think it is safe to say that it is quite confusing. Maybe we should have it return a Handle interface instead?
It may be confusing, but it is the only type-safe way that involves no casts at all. A Handle
class would mean that you need checked casts, and that you could confuse handles that belong to different environments. The generic parameter prevents this nicely.
Maybe we should fix our own tool and add documentation instead?
The intention of Set was that multiple appearances of equal formulas are irrelevant and can/should be avoided
Sure, I understand, but I think it gets into a territory of an API which tries to be too clever for it's own good, especially given the fact that in our own usages singleton sets are manually constructed. I would prefer to change it, but now we have to keep backwards compatibility until the next release, so possibly would have to add a new function with a similar name.
I think there is a misunderstanding between you too here. AFAIK @cheshire is talking about List<Set<T>>
vs. List<T>
and @kfriedberger is talking about List<Set<T>>
vs. List<Collection<T>>
. I understand that List<T>
provides less functionality then the other solutions and thus can't replace the existing method. Changing List<Set<T>>
to List<Collection<T>>
would make sense but has very little benefit and is not worth the incompatibility. There only remains the question whether we want to add a method with List<T>
as utility that delegates to List<Set<T>>
.
Hm, another try to explain the nested structure:
We need the nested List<Set<T>>
or List<Collection<T>>
, because interpolation queries are in this exact form:
(get-interpolant (A B) (C D E))
is identical to (get-interpolant (and A B) (and C D E))
, where A,B,C,D,E
are formula identifiers of type T
. The nested structure avoids the conjunction that would need additional formula identifiers and causes additional overhead for the SMT solver. Depending on the underlying solver, we cannot even build conjunctions of identifiers, but would have to map the identifier back to its formula first. There could even be more conjunctions for sequence interpolants or tree interpolants.
Maybe we should fix our own tool and add documentation instead?
Probably, I don't have a better solution on top of my head. To me it does seem like we are being overly clever with generics.
I understand that List
provides less functionality then the other solutions and thus can't replace the existing method
As far as I currently understand it, List<Set<T>>
is just a syntactic "sugar" for avoiding building the conjunction.
I use quotes because in all our current usages those sets are singletons, and are just extra overhead.
The nested structure avoids the conjunction that would need additional formula identifiers
I'm not quite sure what do you mean here.
and causes additional overhead for the SMT solver
Do we have hard data for this? Also please note that in practice we never actually use the inner set for the sequential interpolation.
Depending on the underlying solver, we cannot even build conjunctions of identifiers
Why not?
Also just noticed another issue with the current API: it conflates the "push" semantics with the semantics of marking formula boundaries for interpolation. This makes it difficult to perform push/pop when they are not directly correlated with the interpolation boundaries.
As far as I currently understand it, List<Set
> is just a syntactic "sugar" for avoiding building the conjunction. I use quotes because in all our current usages those sets are singletons, and are just extra overhead.
Correct, instead of a Set<T>
of formulas f
we could use one formula of type T
.
There is currently no use-case for the set-based solution, thus we use singleton-sets in all callers.
Also please note that in practice we never actually use the inner set for the sequential interpolation.
There might be other users of our API and perhaps they use it.
The basic reason for the set-based solution is the interface for Princess,
because it exactly matches Seq<Set<Object>>
(Scala) and List<Set<Term>>
(Java).
In my opinion we should keep it as is
, because there is no benefit of changing it.
The semantics for "addConstraint" comes from Solvers like SMTInterpol and Princess, where the identifier must be set before adding the assertion onto the solver's stack.
As the default interpolation method getInterpolant(List)
returns an interpolant between the given list of formulas and the rest of the formulas on the stack, it makes sense to directly return the identifier when adding a new formula.
Btw., why is this method not supported for all solvers according to the Javadoc? This is regular interpolation that every solver supports, nothing fancy like tree interpolation, right?
It is quite ugly to have a method that is recommended for some solvers but unsupported for others.
@PhilippWendler
There is no documentation about any solver whether repeatedly calling getInterpolant(lst1,lst2)
with a growing lst1
and a shrinking lst2
with the same internal proof returns a valid sequence of sequence interpolants. Thus only the minimal guaranteed features were implemented, and this includes seqInterpolants(lst)
only for solvers with direct support for this.
We had the same rule about theories and e.g. division, where we decided that whenever something can not be implemented into JavaSMT in a way that it is guaranteed, we throw an UnsupportedOperationException
.
There are different interpolation strategies in CPAchecker that use these different interpolation approaches. Maybe someone might want to merge the interpolation strategies from CPAchecker into JavaSMT :-)
@kfriedberger You are right that we should implement this method only if we can guarantee it works correctly.
However, I would argue that we can do so. All of CPAchecker's interpolation-based analyses absolutely rely on exactly this assumption, since forever. The original implementation of the MathSAT integration was written by Alberto, who certainly would not have relied on this assumption if MathSAT wouldn't guarantee it. In fact, Alberto also added code relying on this assumption to pysmt. For the other SMT solvers it is certainly the same. If you are unsure you can ask their developers for an official statement.
In fact, I noticed just now that there is a comment on getInterpolant
that explicitly does not guarantee an inductive sequence of interpolants. So now I as a CPAchecker developer have a feature request for JavaSMT to add this guarantee, because we absolutely need it.
So, to restate:
InterpolatingProverEnvironment
:
List<BooleanFormula> getSeqInterpolants(List<T> partitionedFormulas) {
return getSeqInterpolants(Lists.transform(partitionedFormulas, ImmutableSet::of));
}
getSeqInterpolants
for all solvers. For solvers that do not expose such an API, think we can assume that sequential calls to their getInterpolants
API return an inductive sequence of interpolants, or we need to ask their developers if you are unsure.There is no direct guarantee that the interpolants returned are part of an inductive sequence
from getInterpolant
.@PhilippWendler I'm currently completely blocked by finishing the thesis, and will get to that next month.
Personally, I am rather unsatisfied with both issues mentioned above:
<?>
either using the separate function trick or via unsafe casts is very confusing (at least people who I was showing to were confused, and so was I before that), and brings very questionable benefit: after all, the Formula
object is not parameterized and can be passed freely to contexts it does not belong to, thus it's unclear why handles should receive special treatment. Especially given that they are usually used within a single try
block and don't travel outside of it.Thus I would like to change that, even at the cost of b/w incompatible change.
List<Set<T>>
to List<T>
for sequential interpolants for essentially the same reason. Current semantics is not immediately intuitive (is it conjunction or disjunction over the formulas in a set?) and we have no data it brings any benefits over simply creating conjunctions on the client side. If we do b/w incompatible change for the first point, the second should be fine. For existing clients it would be trivial to replace the set construction by disjunction).@PhilippWendler if we add the utility method due to type erasure it couldn't have the same name. And I would prefer not to have needless overhead.
@kfriedberger
There is no documentation about any solver whether repeatedly calling getInterpolant(lst1,lst2) with a growing lst1 and a shrinking lst2 with the same internal proof returns a valid sequence of sequence interpolants
Maybe can you get a short example demonstrating such a sequence? I might be missing something obvious, but I have hard time thinking of an example.
For the handles, I can follow your argument. However, I would not make a backwards-incompatible change for such a relatively unimportant reason quite soon after version 1.0. Wouldn't this send the signal that the API is not really stable and we just keep changing things as we did before 1.0?
For getSeqInterpolants
, I would not remove the existing method, because the replacement provides less possibilities, and some people might need this flexibility. Also, again this would be a change that I would not consider worthy for a backwards incompatible version.
I would make such small backwards-incompatible changes only in batches when enough has accumulated that warrants a new major version (or better yet, add the new methods and keep the old as @Deprecated
).
For the incremental interpolation feature, we can trivially construct examples that are not inductive sequences of interpolants, I am just arguing that solvers actually guarantee inductiveness.
@PhilippWendler OK so how should b/w incompatible changes be handled? Put them on a branch, wait till "enough" accumulates (for some definition of "enough") and then make a new major release?
because the replacement provides less possibilities
I don't follow that, which ones? If the user can give a set, they can construct a conjunction.
we can trivially construct examples
OK, could you give one for completeness?
@cheshire
There is no documentation about any solver whether repeatedly calling getInterpolant(lst1,lst2) with a growing lst1 and a shrinking lst2 with the same internal proof returns a valid sequence of sequence interpolants
Maybe can you get a short example demonstrating such a sequence? I might be missing something obvious, but I have hard time thinking of an example.
An example for this is CPAchecker with PredicateAnalysis. For a list of successive pathformulas [pf1,pf2,pf3,pf4]
, the interpolants are computed via the queries getItp([pf1],[pf2,pf3,pf4])
, getItp([pf1,pf2],[pf3,pf4])
, getItp([pf1,pf2,pf3],[pf4])
, all queries are computed from the same proof. That seems to work and even produce a valid sequence of interpolants. The result seems to be identical to the list of interpolants returned from one single query getSeqItp({pf1},{pf2},{pf3},{pf4})
, however I did not find any documentation or proof for this 'coincidence'.
@kfriedberger I've meant specifically an example where the assumption you were talking about is violated: the solver "does not return a valid sequence of sequence of interpolants". Just write an artificial one.
ok,
hypothetical example / not executed / just plain theory:
List of formulas: [false,true,false]
getItp([false],[true,false])
-> TRUE
(valid interpolant)
getItp([false,true],[false])
-> FALSE
(valid interpolant)
However, the sequence of interpolants is invalid: TRUE && true => FALSE
is incorrect.
The corresponding rule for sequences is similar to: (ITP_i && PF_i) => ITP_(i+1)
I did not find any documentation or proof for this 'coincidence'.
Above I have argued that this is not a "coincidence".
@PhilippWendler OK so how should b/w incompatible changes be handled? Put them on a branch, wait till "enough" accumulates (for some definition of "enough") and then make a new major release?
Not sure. Best would be If we can add the new API now, mark the old as @Deprecated
, and remove it later.
because the replacement provides less possibilities
I don't follow that, which ones? If the user can give a set, they can construct a conjunction.
No. The user needs to pass what you call a handle, not a formula. So they can't pass a conjunction of handles.
@PhilippWendler Indeed it's quite interesting that the commit you have mentioned in PySMT is working. Might be a hidden solver feature? I'll ask the developers. In any case, it's possible to do sequence interpolation "properly" using binary interpolation: one just has to use the previously obtained result in the new interpolation query:
I_0 = \top
I_1 = interpolate(I_0 /\ T_1, T_2 /\ ... T_n)
...
etc
Additionally, I find the getInterpolant
method a bit awkward, as it seems to favor MathSAT solver all others by directly adapting to its API, and forcing all other solvers to keep a datastructure holding all asserted formulas (and the whole assertion stack as well!) and then awkwardly calculating the difference between "current" formulas, and formulas passed as an argument.
And if we add a second method for interpolation which only accepts List<InterpolationHandle>
, then it becomes redundant as well, as it is trivial to construct a list of two arguments.
@kfriedberger if we were to go with b/w incompatible change, thoughts on removing getInterpolant
? That would simplify the code considerably, and reduce the overhead caused by storing the formula stack.
I would also prefer the cleaner version with (only) the methods getSeqInterpolants
and getTreeInterpolants
.
The changes needed in CPAchecker are surely not backwards-compatible, but can be seen as a 'one-time-cost' for supporting a 'new' API. The cleaner version of the API makes it easier for users to understand and we should not only depend on CPAchecker developers as potential users.
Let's drop the method getInterpolant
and only provide getSeqInterpolants
and getTreeInterpolants
.
Hm, thinking about it.
How can we guarantee/enforce that the lists given to getSeqInterpolants
contains all pushed formulas? What happens otherwise? Interpolation with assumptions?
This might be a case where the old method getInterpolant
is better, because the formulas from the parameter are one part and the rest of the pushed formulas are the other part.
I think (but would need to check with tests) that for all non-MathSAT solvers the API is effectively stateless, and only operates on formulas given as a query. To me that seems more intuitive than thinking about the solver state. If it's not a problem for all other solvers, I don't think it is a problem for us, but I can check how easy is changing the CPAchecker code before moving the change to the master branch.
The API is stateful, because the pushed formulas need to be UNSAT, and the proof (UNSAT-core or similar) is kept as 'state'. It does not make sense to ask a solver for an interpolant for a subset of formulas, but it is possible and I do not know the intention for this.
What happens otherwise? Interpolation with assumptions?
Don't we also have the same problem now with sequence/tree interpolation API? Let me get back to you on this once I tests things more exhaustively. The meta-argument on why this should work is that it works just fine for all non-MathSAT solvers.
@PhilippWendler Regarding sequence interpolants: I have contacted Alberto, and the assumption used in PySMT indeed always holds for MathSAT. So we can have sequence interpolants in MathSAT.
What happens otherwise? Interpolation with assumptions?
@kfriedberger what solvers do based on my tests is that constraints added to the solver, but not marked to the interpolation are implicitly added to the left-most (or right-most, does not make a difference) side of the sequence query. I suppose it's even more convenient, since using that approach the previous API can be emulated even easier.
Though when rewriting old usage using new API I have found that generally the code seems more readable if implicit statefulness is eliminated, and the caller explicitly specifies groups between which the interpolation is performed.
Actually, at least for Z3 the definition is slightly different, as "background" formulas are always allowed in the interpolant:
Any premises of the proof not present in the pattern are
treated as "background theory". Predicate and function symbols
occurring in the background theory are treated as interpreted and
thus always allowed in the interpolant.
partially done in f2dc27be8204b8cb57e90b5675b852ec3e2b7690
Just like Z3, SMTInterpol uses all unused formulas as background theory and their symbols can appear in all inteprolants (see https://github.com/ultimate-pa/smtinterpol/issues/38#issuecomment-423973123 ).
We can close this issue now, because the InterpolationProver allows to use List<T>
as input as well as List<Collection<T>>
, which was the original intent here.
Hi,
I am not sure who wrote this API originally, probably @PhilippWendler .
The current signature for getting sequential interpolants is
List<BooleanFormula> getSeqInterpolants(List<Set<T>> partitionedFormulas)
. I think wrapping the innerT
inside set is confusing and unnecessary, and acceptingList<T>
is considerably more natural. Moreover, all the current users create those sets artificially just before calling this method.I would propose to add a new method (unfortunately, can't use the same name due to erasure), and maybe mark the previous one for deprecation (since we're not ready to leave 1.0 yet...). Suggestions?