runtimeverification / k

K Framework Tools 7.0
BSD 3-Clause "New" or "Revised" License
452 stars 149 forks source link

Kore: semantics of injections (and the attribute `sortInjection`) #3613

Open h0nzZik opened 1 year ago

h0nzZik commented 1 year ago

TL;DR: I think that In Kore, inj should be considered a built-in operator instead of a sort-overloaded symbol. This would simplify the semantics of Kore a lot, and most likely simplify the backends also a bit.

Proposal

Currently, subsorting in Kore is done by means of the sort-overloaded symbol inj{From, To} that has to be declared in the *.kore file, like this:

symbol inj{From, To}(From) : To [sortInjection{}()]

typically in an INJ module:

 module INJ 
     symbol inj{From, To}(From) : To [sortInjection{}()]
     axiom {S1, S2, S3, R} \equals{S3, R}(inj{S2, S3}(inj{S1, S2}(T:S1)), inj{S1, S3}(T:S1)) [simplification{}()]
endmodule
[]

From a language-design perspective, this has some drawbacks:

  1. The declaration depends on the sortInjection attribute, which has very tricky semantics. First, it is not clear what concrete symbols it actually defines (does it declare the symbol inj{A,B} if A,B are unrelated sorts - for example, Int and List?). Second, the interpretation of the symbols is very hard to specify, especially in the presence of “diamond sub-sorting” (e.g., if A < B,C < D, where < denotes subsorting, what is the relation between inj{A, D}(a), inj{B,D}(inj{A, B}(a)), and inj{C,D}(inj{A, C}(a)). This unnecessarily complicates the precise documentation of Kore, let alone formal semantics.
  2. It lowers the abstraction level of Kore from order-sorted to multi-sorted, by requiring that what is essentially an upcasting operator is always interpreted as a matching logic symbol. (The other option, interpreting the upcasting operator as an identity function on metalevel, would be more straightforward when documenting/formalizing Kore.)
  3. It is not clearly specified what should happen in the presence of multiple symbols with the sortInjection attribute.
  4. It requires the “user” (be it the frontend or whoever writes the test cases for backends) to always include the same piece of Kore code again and again.
  5. It is the only situation in Kore where we need sort-parametric symbols. (I am not sure if that is really the case, please correct me if I am wrong here.)

I will explain these issues in more detail below. Now, as a remedy, it would suffice if we:

  1. Didn’t emit the INJ module (containing the declaration of the inj symbol) in the frontend
  2. Formally, in the syntax of Kore, we would add upcast-pattern and simplified application-pattern as follows;
    
    <pattern>
    ::= <variable-pattern>
    | <matching-logic-pattern>
    | <upcast-pattern>                       //< NEW
    | <application-pattern>
    | <string-literal>
::= "{" "}" "(" ")" //< no sorts parameters ::= "inj" "{" "," "}" "(" ")" //< exactly two sort parameters and one pattern ``` That way, generators of Kore terms do not even need to change unless they rely on sort-parametric symbols other than "inj". ## Technical details ### The ideal, simple semantics of Kore (after applying the proposed changes) We would have order-sorted signatures and order-sorted models. In the model, there would be a bunch of carrier sets similarly as in a many-sorted case; however, unlike in the many-sorted case, the carrier sets would be ordered by set inclusion in the same way the sorts are ordered by the subsort relation. The `inj` operator would be part of the logic and would be interpreted as identity. And there would be no `sortInjection` attribute. ### Less convenient options for the semantics of the current Kore I can give semantics to the Kore even in the presence of the `inj-as-symbol`; however, it would be much more messy and I am afraid that nobody would like it and nobody would want to read it. So here are a few options. 1. Require that each *.kore starts with the INJ module verbatim: ``` module INJ symbol inj{From, To}(From) : To [sortInjection{}()] axiom {S1, S2, S3, R} \equals{S3, R}(inj{S2, S3}(inj{S1, S2}(T:S1)), inj{S1, S3}(T:S1)) [simplification{}()] endmodule [] ``` which has no influence on the interpretation of the rest of the file, and every other module must `import INJ`, and prohibit the use of the `sortInjection` after that. And then use the ideas from the "ideal, simple semantics of Kore" from the above section for the interpretation of the rest of the file. 2. Say that the `sortInjection` attribute is allowed only on symbol declarations, in which case it means that the symbol `inj` is not really a symbol but an upcasting operator - and proceed as in the ideal case. 3. Make the `symbol` declaration always declare a symbol, or a symbol family if it is parameterized, but if the `sortInjection` attribute is present on a symbol declaration with two sort parameters, then the symbol family is not indexed by all pairs of sorts, but only of such pairs which are in the subsorting relation. Also, make sure that the symbols that _are_ declared have the correct interpretation: they are used to build something-like-new-terms, with something-like-no-junk-no-confusion properties, except that if there are multiple paths between the upper and lower sort, then some sharing has to happen. (I am not precise here because being precise is too complex - which is the point I am making.) As a concrete example, consider sorts `A,B,C,D` with `A < B`, `A < C`, `B < D`, and `C < D`. That is a valid (although nonlinear) ordering. The interpretation of the symbol `inj` has to ensure that `inj{B, D}(inj{A,B}(x)) == inj{C, D}(inj{A,C}(x))` and at the same time `inj{B, D}(x) <> y` whenever `y` is _pure_ D - that is, if `y` is not the result of casting something else to `y`. And this would have to be specified generally in the documentation of Kore, which is probably doable, but _maybe_ not really readable. 4. Make the `symbol` declaration always declare a symbol (or a symbol family) with no exceptions, and use the `sortInjection` attribute to specify only the interpretation of the declared symbols. In particular, injections outside the subsorting relation would be interpreted as bottom / empty set. However, then the backends cannot complain when the user uses seemingly incorrect injections because they would actually be correct - just empty. And the interpretation of the "good" symbols would be as in the previous point. In the cases (2), (3), and (4) what would happen if (a) no such declaration is present, or (b) there are two symbols with the `sortInjection` attribute? Maybe there are some other options that I left unexplored; feel free to comment on them, too. ## Resolution If the proposal does not get accepted, I would most likely choose the option (2) from the less-than-ideal list.
dwightguth commented 1 year ago

We cannot remove sort parameters of symbols. It is used extensively in other places. So put that out of your mind entirely. With that being established, I'm not sure the change really does anything meaningful unless you're also talking about a lot of behind-the-scenes work that is not going to get prioritized because it's not really necessary.

You seem generally to have an idea that there must be a perfectly clear meaning to anything that is syntactically valid KORE. This is manifestly untrue. All you need to do is to consume the KORE generated by the frontend and generate a Coq axiomatization that accurately captures the rewrite theory described by that KORE. With that goal in mind, the overhead you are describing as bringing onto yourself doesn't really make sense to me as necessary work.

Applicative matching logic, the mathematical theory underpinning (indirectly) KORE, does not have any explicit treatment of sorts, allowing sorts to be explicitly axiomatized. If you think the best axiomatization of injections in KORE is via subsorting and an order-sorted structure, that's fine. If you think that it makes sense to use many-sorting and explicit parametric injection symbols with explicit axioms relating to injections, fine. I don't particularly care as long as the rewrite theory described by the axioms you generate is equivalent to the existing semantics of KORE.

If you want to take the work of doing what you describe (minus the part about parametric symbols which is simply unworkable) onto yourself and make the necessary modifications to 4 different parsers and pretty-printers and 7 different components written across 6 different languages entirely onto yourself and you have a plan for how to make the 6+ different PRs that would be required in a way that doesn't interrupt or break anything for anyone else at the company, I'm not fundamentally opposed to the simple syntactic change that you're talking about. But it's not a useful use of anyone else's time, and I'm not persuaded it's a useful use of yours either.

daniel-horpacsi commented 1 year ago

You seem generally to have an idea that there must be a perfectly clear meaning to anything that is syntactically valid KORE. This is manifestly untrue. All you need to do is to consume the KORE generated by the frontend and generate a Coq axiomatization that accurately captures the rewrite theory described by that KORE.

How do you argue as to whether the "axiomatization" "accurately captures the rewrite theory" without having formal semantics (i.e. "perfectly clear meaning to anything that is syntactically valid")?