Escaping of non SMT-LIB compliant names?

PhilippWendler commented 8 years ago

With SMTInterpol, the following code fails:

bmgr.makeVariable("select");

Exception:

de.uni_freiburg.informatik.ultimate.logic.SMTLIBException: Function select is already defined.
    at de.uni_freiburg.informatik.ultimate.logic.NoopScript.declareFun(NoopScript.java:120)
    at de.uni_freiburg.informatik.ultimate.smtinterpol.smtlib2.SMTInterpol.declareFun(SMTInterpol.java:1885)
    at org.sosy_lab.solver.smtinterpol.SmtInterpolEnvironment.declareFun(SmtInterpolEnvironment.java:304)
    at org.sosy_lab.solver.smtinterpol.SmtInterpolFormulaCreator.makeVariable(SmtInterpolFormulaCreator.java:84)
    at org.sosy_lab.solver.smtinterpol.SmtInterpolBooleanFormulaManager.makeVariableImpl(SmtInterpolBooleanFormulaManager.java:55)
    at org.sosy_lab.solver.smtinterpol.SmtInterpolBooleanFormulaManager.makeVariableImpl(SmtInterpolBooleanFormulaManager.java:1)
    at org.sosy_lab.solver.basicimpl.AbstractBooleanFormulaManager.makeVariable(AbstractBooleanFormulaManager.java:69)
    at org.sosy_lab.solver.test.SolverBasicTest.testArrayFunctions(SolverBasicTest.java:143)

The reason is that select is already an existing interpreted function in SMTInterpol.

I think that JavaSMT should transparently escape such reserved names for variables and functions by quoting them. Probably select is not the only case where this needs to be done.

kfriedberger commented 8 years ago

The same problem appears for "true" and "false", if they are created as integer variables. SMTInterpol cannot create variables with such names. It might also happen for other reserved tokens.

kfriedberger commented 8 years ago

SMTlib allows to escape names with |name|. Perhaps we should always apply this (transparently) to variables for SMTInterpol. The problem might then be moved to cases where parsing or dumping of formulas is needed.

cheshire commented 8 years ago

Let's slow down a bit here, why do we have a variable with such a name in a first place? I saw the corresponding ticket in CPAchecker, but don't we escape names with pipes in ctoformula package? I'm somewhat opposed to doing opaque escaping.

cheshire commented 8 years ago

SMTlib allows to escape names with |name|. Perhaps we should always apply this (transparently) to variables for SMTInterpol.

If we do that, the formulas in SMTInterpol will look different to formulas in any other solver, which would make transferring formulas between the solvers rather unpleasant.

Of course, we can still escape names across all solvers, for both variables and UFs.

PhilippWendler commented 8 years ago

What's the current state for this?

I think the most important point is that declaring a variable like select, store, etc. either fails consistently with all solvers or is supported for all solvers. If we let it fail, this would need to be documented, and we should probably add utilities for escaping such that not all client needs to reimplement this over and over (e.g., add something like String escapeNameIfNecessary(String)).

cheshire commented 8 years ago

According to SMT-LIB documentation each variable called "x" can be referred interchangeably as either "x" or "|x|".

We should test if that's indeed the case for all solvers and if that's the case, we can uniformly escape everything.

On Apr 4, 2016 12:49, "Philipp Wendler" notifications@github.com wrote:

What's the current state for this?

I think the most important point is that declaring a variable like select, store, etc. either fails consistently with all solvers or is supported for all solvers. If we let it fail, this would need to be documented, and we should probably add utilities for escaping such that not all client needs to reimplement this over and over (e.g., add something like String escapeNameIfNecessary(String)).

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/26#issuecomment-205238534

cheshire commented 8 years ago

According to SMT-LIB documentation each variable called "x" can be referred interchangeably

Actually that turned out to be wrong: all solvers treat "x" and "|x|" to be different.

and we should probably add utilities for escaping

But solvers already have these utilities! And they use them! Just not in this case.

I'm tempted to say that's just a bug in SMTInterpol, as all other solvers seem to be able to handle it.

@PhilippWendler Yes, we can escape all variables. I do not like it for many reasons:

Introspection, serialization, and model assignments will return values different to what user has entered.
We can try to wrap/unwrap everything, but that would be a lot of code to cater for one bug in one solver, and this code has potential to be buggy as well (I'm not even sure how to un-apply escaping to the serialization output, that would mean SMT-LIB parsing, which we currently don't do.)
For visitors wrapping/unwrapping would imply a very heavy performance penalty due to all the string concatenation. And I'm not just theorizing here: for LPI on many benchmarks introspection takes like 70% of the runtime.

Hence I suggest a following course of actions: I'll file a ticket with SMTInterpol developers, and for select/store/etc I would suggest escaping on CPAchecker level.

cheshire commented 8 years ago

Of course we can also have a workaround which would work now: apply escaping only for SMTInterpol, and only when declaring the function fails (catch exception and retry)

PhilippWendler commented 8 years ago

I am not exactly sure what your proposal is, the two parts of the proposal seem to contradict each other.

Anyway, I can only refer to my comment above (from April 4th), this summarizes my opinion. The only thing I would add is that if some user-visible escaping is applied anywhere, this would also need to be documented and utilities for unescaping should be provided.

cheshire commented 8 years ago

@PhilippWendler My proposal is to close this ticket, and to apply escaping in CPAchecker. I don't want to add a utility function for escaping variables, since it gives the impression that they need to be escaped, but this is not the case, since all other solvers escape them already.

As to documenting, I'm not sure where it should go, any suggestions? All solvers have many bugs, are we supposed to document all we know?

PhilippWendler commented 8 years ago

If one solver needs escaping and this is not handled by JavaSMT, then JavaSMT's API does need escaping. One should not code against JavaSMT's API with only a specific solver in mind.

If using JavaSMT's API needs escaping, then of course escaping utilities should be added to JavaSMT. It does not make sense at all to let every user of the API re-implement escaping.

However, I still do not understand why you propose escaping in CPAchecker when at the same time you say solvers should do the escaping.

And yes, all known problems with solvers should be documented if we cannot add workarounds. Documenting them once is much less effort than letting all users of JavaSMT rediscover them and spend effort debugging them. I suggest a section in the documentation.

cheshire commented 8 years ago

@PhilippWendler If there was a solver that does not accept variable names starting with a letter "a", does it mean we should have wrapping just for this solver? How far should catering for particular needs go?

If we add an escape function, it will be extremely counter-intuitive if it needs to be applied only when a certain solver is used. This also defeats the point of being able to transparently switch solvers. If we say that it always should be applied, then it's not clear why it's not applied automatically. If we always apply it automatically, then we get the problem that introspection, models and serialization do not return expected values.

To me we have multiple evil options, and crashing on certain variable names is currently the least evil solution.

PhilippWendler commented 8 years ago

If we add an escape function, it will be extremely counter-intuitive if it needs to be applied only when a certain solver is used. This also defeats the point of being able to transparently switch solvers. If we say that it always should be applied, then it's not clear why it's not applied automatically.

Exactly.

If we always apply it automatically, then we get the problem that introspection, models and serialization do not return expected values.

But you say we do have this problem already because other solvers do escaping internally. So why is this a problem for SMTInterpol specifically?

Why can't we add transparent unescaping for all solvers to solve the problem?

You claim a performance problem, but the same performance problem would occur if users of JavaSMT apply the (un)escaping themselves.

The most important criterion for the JavaSMT API in this case is IMHO that it should behave similarly for all solvers. Second criterion is to make it as easy to use as possible for users. Doing nothing satisfies neither of them.

cheshire commented 8 years ago

You claim a performance problem, but the same performance problem would occur if users of JavaSMT apply the (un)escaping themselves.

Except they don't have to, if they use a different solver, or don't create variables called select/store, or escape them themselves, so they don't expect them to be equal on introspection.

Why can't we add transparent unescaping for all solvers to solve the problem?

Code complexity: have to remember to synchronize it across visitors/serialization/models

Performance.

How do we even do transparent unescaping for SMT-LIB serialization?

You seriously suggest to add all these problems in order to have a workaround for a problem in SMTInterpol?

PhilippWendler commented 8 years ago

I still do not understand the performance argument. You just suggested to the SMTInterpol developers to add escaping, but you don't want to add the very same feature to JavaSMT because of performance? Why should this be slower in JavaSMT than in SMTInterpol?

Why would users that escape themselves be affected if we escape select etc.?

Why should a workaround for SMTInterpol that makes it behave like the other solvers affect users of these other solvers?

I would not expect any form of unescaping for SMT-LIB serialization, because escaping is simply required for SMT-LIB serialization (otherwise it would be invalid). I am pretty sure that the serialization output of all other solvers also contains escaping for select etc. if not even for all variables.

cheshire commented 8 years ago

I still do not understand the performance argument. You just suggested to the SMTInterpol developers to add escaping, but you don't want to add the very same feature to JavaSMT because of performance? Why should this be slower in JavaSMT than in SMTInterpol?

Performance of what specifically though? I was saying that adding "|" on every single visitor traversal is very costly, what does this have to do with adding pipes in SMTInterpol when creating the symbol? Our problem (visitor provides a pre-computed String) does not apply to them.

Why would users that escape themselves be affected if we escape select etc.?

I don't understand what this question is referring to.

Why should a workaround for SMTInterpol that makes it behave like the other solvers affect users of these other solvers?

I don't understand that question either, sorry.

I would not expect any form of unescaping for SMT-LIB serialization, because escaping is simply required for SMT-LIB serialization (otherwise it would be invalid).

So our output with your suggestion will be escaped twice, once by us, and once by the solver, giving a different string than without escaping. This can be bad.

PhilippWendler commented 8 years ago

I still do not understand the performance argument. You just suggested to the SMTInterpol developers to add escaping, but you don't want to add the very same feature to JavaSMT because of performance? Why should this be slower in JavaSMT than in SMTInterpol?

Performance of what specifically though?

Performance of whatever you were claiming is a performance problem.

I was saying that adding "|" on every single visitor traversal is very costly, what does this have to do with adding pipes in SMTInterpol when creating the symbol? Our problem (visitor provides a pre-computed String) does not apply to them.

I cannot see any difference in code for the implementation of the escaping feature between doing it in JavaSMT and SMTInterpol itself. If you call makeVariable("select") in a visitor, why does it make a performance difference if we add the pipes or SMTInterpol? All other solvers have this feature already and it is not a performance problem!

Why would users that escape themselves be affected if we escape select etc.?

I don't understand what this question is referring to.

You were claiming that we doing escaping is a performance problem for users that do escaping themselves. I ask why this should be the case.

Why should a workaround for SMTInterpol that makes it behave like the other solvers affect users of these other solvers?

I don't understand that question either, sorry.

You were claiming that we doing escaping is a performance problem for users that use other solvers than SMTInterpol. I ask why this should be the case.

I would not expect any form of unescaping for SMT-LIB serialization, because escaping is simply required for SMT-LIB serialization (otherwise it would be invalid).

So our output with your suggestion will be escaped twice, once by us, and once by the solver, giving a different string than without escaping. This can be bad.

Why should there be dual escaping? I do not understand this at all. Our problem is that SMTInterpol does not do escaping, so if we give it an escaped variable I certainly do not expect SMTInterpol to escape it again when writing SMT-LIB output. So no dual escaping.

cheshire commented 8 years ago

1) I think I see where the misunderstanding about the visitors comes from. I was talking about the problems caused by _un_escaping code, since you were proposing to add transparent escaping. If we add transparent escaping, we would need to remove it for visitFunction to work properly, and thus even empty, do-nothing recursive visitor would become very expensive.

2) Regarding other solvers, my understanding was that you propose to escape across all solvers? Or do you propose to only escape for SMTInterpol?

If we only escape for SMTInterpol, how should serialization to SMT-LIB work? Should we undo our escaping first?

Why should there be dual escaping?

IIRC, SMTInterpol does do escaping for SMT-LIB serialization. If we escape as well, everything will be escaped twice.

PhilippWendler commented 8 years ago

1) I think I see where the misunderstanding about the visitors comes from. I was talking about the problems caused by unescaping code, since you were proposing to add transparent escaping. If we add transparent escaping, we would need to remove it for visitFunction to work properly, and thus even empty, do-nothing recursive visitor would become very expensive.

But why is this transparent unescaping more expensive if done by JavaSMT than if done by SMTInterpol? Why is the transparent unescaping a problem here if it is not a performance problem for all other solvers that already do it? If you really expect the unescaping for select to be so expensive, you can even keep a constant map |select| -> select and avoid any string allocations altogether (I think this is not necessary).

I simply propose to add the necessary adjustments to make SMTInterpol behave just like the other solvers. I do not see why we cannot do what all the other solvers successfully do.

2) Regarding other solvers, my understanding was that you propose to escape across all solvers? Or do you propose to only escape for SMTInterpol?

Why should we do escaping for other solvers which manage to do it correctly themselves?

If we only escape for SMTInterpol, how should serialization to SMT-LIB work? Should we undo our escaping first?

No, why?

Why should there be dual escaping?

IIRC, SMTInterpol does do escaping for SMT-LIB serialization. If we escape as well, everything will be escaped twice.

If we give |select| to SMTInterpol and SMTInterpol really produces ||select|| in some output, I would simply ask the developers to not do this (but I would be surprised if it really did this).

cheshire commented 8 years ago

OK maybe I have misunderstood initially. IF we escape only for select/true/store/etc and only for SMTInterpol, and SMTInterpol does not escape already escaped variables for serialization, this is doable.

cheshire commented 8 years ago

@PhilippWendler The situation with pipe symbols in SMTInterpol is even more funny: it happily accepts them, treat the identifiers as different, and then crashes on printing models or context with an assertion error. Is it something we are supposed to fix as well in the wrapper? If not, than how is it different to crashing on "select"?

PhilippWendler commented 8 years ago

I would simply file these crashes as bugs in SMTInterpol.

cheshire commented 8 years ago

@PhilippWendler It's not that simple, it's done on purpose. Pipe symbols should have no meaning in SMT-LIB syntax (|x| == x) They have meaning internally when adding such variables via API to other solvers --- but they get escaped when the SMT-LIB serialization is done (e.g. serializing a variable |x| to SMT-LIB in Z3 gives |x|).

SMTInterpol decides to not accept such variables altogether, but does so in an inconsistent way, where the check is done only when performing the serialization, to SMT-LIB or to the model. I have filed this inconsistency as a bug, but the assertion would remain in any case.

So what's your proposal now? We can still escape select/store/etc variables for SMTInterpol with some other symbol, e.g. with triangular brackets. That adds an extra layer of abstraction, and does not achieve a goal of being able to consistently create same variables across all solvers, as pipe symbols would still behave in an inconsistent way.

Overall, trying to enforce consistency between different solvers gets us into a can of worms, and I'm not sure that this should be a goal of JavaSMT.

PhilippWendler commented 8 years ago

I would go with escaping with triangular brackets or ! or something similar.

If pipes are supported inconsistently and we do not want to or cannot fix this, then maybe we should forbid them completely?

The definition of JavaSMT is to provide a common API for a range of solvers, and an API does not only specify method signatures but also how the methods behave. So of course enforcing consistency is for me one of the main goals of JavaSMT. Users of JavaSMT should not have to know and follow a lot of solver-specific rules when using JavaSMT.

cheshire commented 8 years ago

@PhilippWendler Well OK great, how do we distinguish between our <select> and user-created <select>? One of the goals of JavaSMT is to remain simple. Trying to enforce consistency across all behaviors introduces a lot of complexity, with very questionable benefit.

PhilippWendler commented 8 years ago

If we forbid variable names matching |.*| because there is no way to sanely handle them across all solvers, than we could also forbid variable names that match our own escaping pattern. The alternative is of course to simply escape all variables that already appear escaped: user-created <foo> becomes <<foo>>.

All this complexity of variable-name handling won't go away if JavaSMT refuses to handle it. The burden will be just be on each individual user of JavaSMT, and thus be much greater in the end. Everything that JavaSMT can do to improve consistency between solvers will have a very large benefit because otherwise the problem needs to be solved again and again by users of JavaSMT.

cheshire commented 8 years ago

On Fri, May 20, 2016 at 5:54 PM, Philipp Wendler notifications@github.com wrote:

If we forbid variable names matching |.*| because there is no way to sanely handle them across all solvers, than we could also forbid variable names that match our own escaping pattern.

What if someone has an application that already generates formulas with triangular brackets, so they won't be able to use JavaSMT because of this.

The alternative is of course to simply escape all variables that already appear escaped: user-created becomes <>.

OK so then we have the problem of all serialized formulas looking differently.

All this complexity of variable-name handling won't go away if JavaSMT refuses to handle it.

I actually think it would. If someone encounters a crash on a variable called "select" I think it would be relatively trivial for them to fix it themselves.

The burden will be just be on each individual user of JavaSMT, and thus be much greater in the end. Everything that JavaSMT can do to improve consistency between solvers will have a very large benefit because otherwise the problem needs to be solved again and again by users of JavaSMT.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/26#issuecomment-220644870

cheshire commented 8 years ago

Sorry I'm closing it, as I currently do not see a way to do this without creating more problems than we had before, and without introducing too much extra complexity. If there's a crash on certain variable name, it's a simple, easy, crash, which is easy to debug. If there's a crash due to some trying-to-be-too-smart logic in JavaSMT, it's much worse.

PhilippWendler commented 8 years ago

The alternative is of course to simply escape all variables that already appear escaped: user-created becomes <>. OK so then we have the problem of all serialized formulas looking differently.

But we have that problem already now! It is not an argument against doing our own escaping. If you want to fix this problem, then you need to add our own escaping that is used for all solvers.

All this complexity of variable-name handling won't go away if JavaSMT refuses to handle it. I actually think it would. If someone encounters a crash on a variable called "select" I think it would be relatively trivial for them to fix it themselves.

Why should this be the case? It is not trivial to fix this problem in CPAchecker. In fact, the easiest place to fix this is in JavaSMT, because in JavaSMT would be the smallest amount of code that needs to be touched. The code that uses JavaSMT is much larger in size, and thus it is more difficult to handle it there and not forget a case.

kfriedberger commented 8 years ago

Should our API allow to create non-SMT-LIB compliant function names?

Current status: Only SMTInterpol disallows special characters and reserved keywords, and throws an exception at some point (inconsistently), i.e. during creation or during printing the name. All other solvers happily accept any String as function name, even if colliding with a reserved keyword or operators. A long as the formula is not dumped (as SMT-LIB) and needs parsing, everything works fine.

Proposal 1: If we do cannot allow arbitrary names in all used solvers, we have to refuse all names that contain non-ascii-chars, "|", "\",... . The overhead for this would be minimal, as we can just check each name when creating a new function. The keywords should be easy to find in the SMT-LIB standard. In this case we have to guarantee that both mkVariable("x") and mkVariable("|x|") return the same result and that the names "select", "<", "+", "[", "(", "|", " ", "\n", "\t", ",", ".", "true", "and", "define-fun", ... throw an exception. The benefit of this approach is a SMT-LIB compliant interface for all solvers, including the possibility to dump and parse all formulas.

Proposal 2: Another possibility is to report this behaviour as a bug for SMTInterpol or write a wrapper especially for SMTInterpol, which replaces invalid chars and reserved keywords with some other tokens. Then we might lose SMT-LIB compliance for "dumped" formulas, but get a nicer interface for the user, because every String (including reserved keywords and also special chars like brackets, unicode-chars,...) is just a "String" and can be used as name without the fear of interfering with any solver-internals. While it might be obvious for a user to not use "+" or "=" as function name (this is possible in all solvers except SMTInterpol), denying unicode characters might feel outdated. With this approach, dumping a formula might not result in valid SMT-LIB in some cases and the dump might be unusable for further usage, but as long as the user just uses our API and does not dump or exchange data between different solvers, it should work fine.

I am in favour of the second proposal.

cheshire commented 8 years ago

@kfriedberger

A long as the formula is not dumped (as SMT-LIB) and needs parsing, everything works fine.

For other solvers everything seems to work fine even if the formula is dumped. They do some escaping themselves.

I wrote an email to Philipp before, maybe I should duplicate some of it here.

Let's talk about CPAchecker. The problematic variables come from C variable names. The way C variables are converted into formula variables (e.g. "int x" in function "main" becomes "main::x") is an arbitrary design choice. Hence, as for any arbitrary design choice, all conversions of variables to formulas should go through a centralized component, and that component should also perform any escaping, if necessary.
Regarding your proposal 1: initially you propose to refuse the names with "|", and then later you say they should simply be ignored (x == |x|)? So what option should be taken?
To me it looks that under proposal 2, if we take an SMTInterpol formula, dump it to SMT-LIB2, then parse it with another solver, dump it again, and parse it with an SMTInterpol we would end up with a formula different from the original one. This would not be acceptable. At the end of the day, for the purposes of reproducibility, the formula generated by the dump function should be exactly the same as the formula given to the solver. Otherwise we might get cases that e.g. solving through JavaSMT API times out, but solving the same formula after dumping it succeeds instantly (due to non-determinism). This would not be acceptable either.

kfriedberger commented 8 years ago

2016-05-27 11:18 GMT+02:00 George Karpenkov notifications@github.com:

@kfriedberger https://github.com/kfriedberger

A long as the formula is not dumped (as SMT-LIB) and needs parsing, everything works fine.

For other formulas everything seems to work fine even if the formula is dumped. They do some escaping themselves.

No, the combination of dumping and escaping works only partially. For example, Mathsat allows to create a boolean variable "+" and dumps it as "(declare-fun + () Bool)". The same applies to "select" or "let". The dump is not SMT-LIB-compliant and no solver (not even Mathsat itself) can parse this. Escaping seems to work for some special cases, where brackets, quotes, spaces or unicode characters are used.

I wrote an email to Philipp before, maybe I should duplicate some of it here.

Let's talk about CPAchecker. The problematic variables come from C variable names. The way C variables are converted into formula variables (e.g. "int x" in function "main" becomes "main::x") is an arbitrary design choice. Hence, as for any arbitrary design choice, all conversions of variables to formulas should go through a centralized component, and that component should also perform any escaping, if necessary.

Proposal 1: Then we have to check/assume that every function name given to JavaSMT is escaped, if necessary, and throw an exception if not. Proposal 2: If we allow arbitrary Strings given from the user, no escaping would be needed at all.

Regarding your proposal 1: initially you propose to refuse the names

with "|", and then later you say they should simply be ignored (x == |x|)? So what option should be taken?

Proposal 1 will disallow "|" in all cases. However when parsing a formula, we must check for valid names in the existing dump.

To me it looks that under proposal 2, if we take an SMTInterpol formula,

dump it to SMT-LIB2, then parse it with another solver, dump it again, and parse it with an SMTInterpol we would end up with a formula different from the original one. This would not be acceptable. At the end of the day, for the purposes of reproducibility, the formula generated by the dump function should be exactly the same as the formula given by the solver. Otherwise we might get cases that e.g. solving through JavaSMT API times out, but solving the same formula after dumping it succeeds instantly (due to non-determinism). This would not be acceptable either.

I think, it is acceptable to have a difference between the dumped version of a formula and the formula itself, if necessary, because SMT-LIB does not allow special keywords or unicode as function names.

Do you prefer the first proposal?

kfriedberger commented 8 years ago

Maybe I should re-formulate my intents (for proposal 2):

What does a user expect from JavaSMT-API? All solvers should be available via the same interface and support the same possibilities for interacting. The API should support all technical advantages of a modern software system, including transparent escaping for problematic names, such that JavaSMT can be used for a wider area than just 'SMT-LIB-compliant formulas'. If a formula contains non-SMT-LIB-compliant names and is dumped, the user has to handle it himself. There is no way of automatic matching of arbitrary Strings to valid function names.
What does CPAchecker expect from JavaSMT? All solvers should be available via the same interface, no more, no less. As CPAchecker needs to dump and parse formulas, it has to check used function names itself, and thus, if needed, implement the escaping for unusual tokens itself or avoid using them. Otherwise we cannot guarantee that a self-written piece of SMT-LIB-code contains a symbol name that cannot be created via the JavaSMT-API, because of some automatic escaping. How can we guarantee, that self-written SMT-LIB-code is parsable in all solvers and the result is also equal, if we do not provide our own parser, but directly use the SMT-solver's parser?

In all cases, the user has the possibility to insert invalid names into the solver, either via the SMT-LIB-code (Proposal 1 escapes only names given in the API, and some solvers support more parsing than others) or via the API (Proposal 2 allows the usage of arbitrary function names, in contrast to a SMT-LIB-compliant dump). There will always be a difference between the SMT-LIB-interface and the API-interface. And in my opinion, the API-interface should allow more possibilities to create function names. The only problem remaining is to guarantee identical behaviour for all solvers, and currently only SMTInterpol has problems with this.

Another solution would be to ignore this topic and fully move the responsibility for creating and providing valid names towards the user. This might be the easiest solution, as it requires no code change for JavaSMT and is (at least partly) obvious to the user. But I do not prefer this option.

cheshire commented 8 years ago

No, the combination of dumping and escaping works only partially.

Right. Is the problem specific to Mathsat? Admittedly, I haven't tried much, but for the cases I have tried Z3 managed to escape things with pipe symbols. I think we should file the cases where it doesn't work as bugs to the respective solver developers.

Proposal 1: Then we have to check/assume that every function name given to JavaSMT is escaped, if necessary, and throw an exception if not.

OK I see. In practice, things are usually escaped using pipe symbols (as it's a no-op under SMT-LIB semantics), yet SMTInterpol crashes on input pipes. In terms of code, what kind of inputs do you suggest crashing on? To me it seems that if a solver allows you to construct a formula, it should be able to serialize it. If it can't serialize a formula it constructed itself, I would say it's a bug in the solver.

I think, it is acceptable to have a difference between the dumped version of a formula and the formula itself

If I have a difficult formula I have constructed using JavaSMT, I should be able to dump it to e.g. report it to solver developers, or to store it. Execution on the dumped formula should be exactly the same as the execution on the formula as constructed inside JavaSMT.

Do you prefer the first proposal?

I prefer not doing anything. Crashing on weird formula names, albeit unpleasant, is very obvious and very easy to fix for a client. Too-smart-for-it's-own-good-behavior which tries to do some transparent escaping and results in weird edge cases (e.g. serialized formula is not the same as what user is expecting) is IMO much worse. Moreover, with escaping we get into dangerous territory of trying to fix bugs/inconsistencies within the solvers, which I think we should try to avoid.

cheshire commented 8 years ago

@kfriedberger In practical terms, do you have a solution which:

1) Does not destroy any legitimate use cases (e.g. escaping things with <> is out because someone might wish to use triangular brackets in variable names) 2) Does not require a large performance overhead 3) Maintains that invariant that any formula dumped and parsed (by any solver) should remain exactly the same 4) Maintains the invariant that taking a printout with an SMT-LIB expression, and creating it in JavaSMT using API results in exactly the same formula 5) Maintains the invariant that any dumped formula should be exactly the same as this formula created using JavaSMT API 6) Does not overly complicate the codebase by having to do escaping/unescaping all over the place.

stephanlukasczyk commented 8 years ago

Sorry for bringing this up again, but this issue is closed and I cannot figure out, whether it will be fixed at some time or is considered as "won't fix". I ask because I have the problem again with a file that has a method called select where SMTInterpol fails with the exception mentioned above.

I see that a lot of things have to be considered with this, hence I'm fine (at least for this case) if it will never be fixed, just wanted to ask out of curiosity.

cheshire commented 8 years ago

@stephanlukasczyk I think we have failed to reach a productive conclusion. Also at a time I was trying to push for a 1.0 release, and this was an obstruction. Now I guess we can revisit this.

I would not be completely against Karlheinz's 1st proposal of crashing on non-SMT-LIB compliant formulas. The problem of crashing on select will (obviously) remain open, but at least it would be consistent across all solvers, and we would have a nice identity transformation Formula -> SMT-LIB String - > Formula.
I do not see a way how to do escaping on JavaSMT level cleanly without creating more problems than we are solving, as visitors and serialization might get different strings than those which were entered by the user.

I still feel that a much simpler way to solve this is in the ctoformula package by applying escaping to all created variable names. I think that it's OK to do that inside CPAchecker because we can agree on the assumption "all variables are escaped using X", whereas such an assumption might not be natural for all possible use cases of JavaSMT.

PhilippWendler commented 7 years ago

Is there a chance this issue will result in backwards-incompatible changes? Then it should be done before 2.0.

cheshire commented 7 years ago

@PhilippWendler will it satisfy everyone if we simply crash on all non-SMT-LIB compliant names in mkVariable and friends, as @kfriedberger has suggested above?

PhilippWendler commented 7 years ago

A consistent handling across all solvers would at least be something, yes.

A problem I see with rejecting names is that the list of names to be rejected depends on the SMT theories we support. Whenever we add a new theory, there will be new operator names that need to be blacklisted, meaning that adding new theories can break existing users' code.

So at least JavaSMT needs to offer utilities for checking whether a name is forbidden and for escaping and unescaping, such that not every user needs to reimplement this themselves (and worse, keep a duplicate of the blacklist that JavaSMT uses). And if we provide such utilities, what was the reason again for not just applying them directly in JavaSMT?

cheshire commented 7 years ago

Whenever we add a new theory, there will be new operator names that need to be blacklisted, meaning that adding new theories can break existing users' code.

Would they? From my understanding, if the solver rejects a name, it always rejects it, JavaSMT support or not. It also seems a mostly theoretical concern, given that from what I remember we only have issues with store/select, and new theories are not added frequently.

So at least JavaSMT needs to offer utilities for checking whether a name is forbidden and for escaping and unescaping

Yes.

And if we provide such utilities, what was the reason again for not just applying them directly in JavaSMT?

Extra complexity and a non-intuitive behavior (the actual formula is not what the user expects). Also create a variable called "x" -> run a visitor collecting variable names -> turns out there's no "x", but "|x|" instead. Such behaviors where the library tries to be overly clever can cause a lot more problems then they solve. Another possible scenario: add a constraint on "x" via JavaSMT (with it silently being renamed), then read a constraint from SMT-LIB file concerning "x" (without renaming) => get two vars instead of one.

cheshire commented 7 years ago

Also, what exception should we throw on bad var names?

PhilippWendler commented 7 years ago

IllegalArgumentException because it is a bug in the client code? Though it is not nice that this way the compiler won't tell us that we forgot to call escapeFunctionName.

Can we add makeEscapedVariable etc. for all methods where escaping can be necessary? Maybe rename the current methods to makeRawVariable or similar to warn the user?

PhilippWendler commented 6 years ago

Note that Z3 4.7.1 behaves differently for variable names that match |.*| than previous versions: it seems to treat |foo| as equal to foo via the API (cf. #128, just run the unit test there). This means that upgrading Z3 would increase the problem that we already have, and we should really come to a conclusion for this issue (no progress since > 1 year).

kfriedberger commented 6 years ago

Currently all solvers and our API seem to be valid in the following points:

any (escaped/unescaped/quoted/unquoted) String can be used to create a variable (with some small exception, i.e., some solvers do not allow special keywords, maybe not relevant), such that the variable's name matches exactly the given String. The solver does not change the String!
when using Formula.toString() some solvers escape or quote some variables. The same seems to apply for the SMT-LIB2-dump. We should identify such cases and report them. Maybe Z3 in version 4.7.1 is such a case.
analyzing formulas should only be done via the FormulaVisitor, i.e., no user should in any way rely on Formula.toString().

In my opinion, we should not escape user-given Strings as part of JavaSMT, because we can handle them directly without further problems.

kfriedberger commented 6 years ago

With commit 0dc6a4b90ae85eb6dda771894162ea5aa7e35a15 we do no longer allow symbol names that are not SMT-LIB2-compliant.

PhilippWendler commented 6 years ago

Now we forbid problematic variable names. Will we add an (optional) utility for escaping and unescaping variable names, such that not all users need to implement this themselves, which is complex and error-prone (and problematic because such an escaping would need to be adjusted every time JavaSMT changes its rules for variable names)? I suggested this already last year.

kfriedberger commented 5 years ago

This could be done as another (optional additional) software layer around JavaSMT. To escape symbol names in JavaSMT, we would have to handle escaped names everywhere:

when creating and printing symbols,
when dumping or reading SMTlib,
when visiting and traversing formulas.

There might be more problems than solutions through this approach. While creating formulas is easy, everything related to printing/dumping is quite complex, because we would need to parse i.e. SMTlib directly, on a very low level.

PhilippWendler commented 5 years ago

A first step would be to at least provide escapeName and unescapeName utilities. Then callers could use them where they want.

For me it was never a goal to add this to the dumping or parsing of SMTLib formulas, we can use the escaped names there.

sosy-lab / java-smt

Escaping of non SMT-LIB compliant names? #26