kframework / k-legacy

The K tools (deprecated, see README)
http://kframework.org
Other
146 stars 61 forks source link

Sort/operation inference and sort annotations #1587

Open grosu opened 9 years ago

grosu commented 9 years ago

Intrigued by @bmmoore 's recent issue #1586, I checked the manual to see how sort/operation inference and sort annotations are explained, and it looks like there are either some misunderstandings or the manual text is rather obsolete. Either way, @radumereuta should fix it asap, as well as the implementation, in case the manual is currently describing the actual implementation.

I added the following text at the beginning of the Manual, so that @radumereuta will take care of it asap.


Note for @radumereuta (by Grigore): Radu, please revise the sort/operation inference and sort annotations sections. The principle underlying sort/operation inference is very simple:

Of course, in the process, all user-provided sort constraints must be respected. Warning/Error messages should be reported if there is no way to achieve the above.

With the above in mind, I think this text in the Sort annotations section is obsolete (and confusing):

The dual is `<:Sort` and forces the inner production to have **exactly**
the type specified (no subsorts). This can be useful sometimes when ambiguous
productions have related sorts. For example, a common ambiguity in K is related
to overloaded lists where we have `Exps ::= List{Exp,","}` and
`Ids ::= List{Id,","}`.  Also we have the subsort `Ids < Exps`. For the
following input: `(A, B)<:Exps`, both productions will be considered, but we
can annotate the expression with the desired sort to choose the desired production.

The operation <:Sort should only require its argument to be parse-able to sort Sort, which includes to any subsorts of Sort. If one just writes (A, B)<:Exps then one gets the expected parsing, because A and B and (A,B) will be inferred their maximal sorts to obey the given constraints, which means Exps for all; if you want A and B to have sort Exp then you have to say it so explicitly, (A:Exp, B:Exp)<:Exps, or have other constraints from the outside. Either way, there is no ambiguity here. If one means (A, B) to parse to Ids Ids, then one just writes (A, B)<:Ids; this will also enforce that A and B are inferred the sort Ids.**

The principle underlying sort annotations is also very simple, and I believe the above maximal-sort conventions correctly implement it:

Radu, as you revise the sort/operation inference and sort annotations sections, please find a nice way to also include the two underlying principles above. I actually believe that the best way to do this is to merge the two sections into only one.

Finally, with this maximal-sort convention, I fail to see the need for both :: and <:>. Only one should suffice, with the semantics syntax Sort ::= Sort "::Sort". Do you disagree?

End of note

bmmoore commented 9 years ago

"maximizes the sort" is a bit unclear. We want a most general sort, but I don't see why we would want greater sorts that don't add any generality. If we have a term (A:Ids, B:Ids), we could say it has sort Exps using signature Exps*Exps->Exps for the comma (depending on the context we it could maybe even be assigned sort KItem or K), but any term that matches that pattern also has sort Ids using signature Ids*Ids->Ids, so it's no less general and a bit more specific to say the whole term has sort Ids. That's why I've talked about the remaining terms getting their least sorts once we've found the maximal sorts for the variables, but I can see that just explaining it as a "least sort" isn't enough for the manual.

grosu commented 9 years ago

The problem with the "least sort for terms" approach is that it will go opposite to the convention for variables (which get the largest sort) and break our second principle, which says that people should not need the labeled form. For example, consider an overloaded + on Exp and on Int, and that you want to write a rule of the form

   rule A + B => `A * B`::Int     // note * in the RHS, instead of +

Here A and B are inferred the sort Int, but you obviously don't want the + in A + B to be the one on Int. One cannot write A + B`::Exp` either, because an `Int` is always an `Exp` (as I said in my note, I believe that our manual is wrong when saying that the semantics of `Term::Sort` is that `Term` must have _exactly_ the sort `Sort`, because any term of subsort of `Sort` also has the sort `Sort`). So the user has no way out of this, except to use the labeled form+:Exp*Exp->Exp(A,B).

bmmoore commented 9 years ago

If you mean there is a single label _+_ which has two overloaded sorts, then applying it to two ints just has a single result, which doesn't require worrying about signatures. If you mean there are separate labels which both have the concrete syntax "+" then it's outside of the case I was thinking of, but I think A::Exp + B =>A B::Int would force it to parse as ``+:ExpExp->Exp(A,B) =>*:Int*Int->Int(A,B).

But, if that's what you are thinking of then I really don't like the idea that resolving a syntactic ambiguity between different productions with inequivalent labels might depend on the exact choice of greater or lesser inferred sorts. Radu, how do disambiguation and sort inference interact now?

grosu commented 9 years ago

All the discussion was for productions that look the same but have different KLabels. Sorry for the confusion. I'll update my note on the manual wiki page to clarify this.

If the two productions in question have the same K label, then in my view there is no ambiguity at all. I believe @dwightguth has already implemented this (that is, no ambiguity is reported if the KASTs of the two ambiguous parses are the same). In this case, I agree with you that it is better for the parser to consider the more concrete sort as result sort of the term. I'm going to clarify that, too, in my note.

But, if that's what you are thinking of then I really don't like the idea that resolving that sort of syntactic ambiguity between inequivalent parses might depend on the exact choice of greater or lesser inferred sorts.

I don't understand this. Is this still relevant, in terms of the above?

grosu commented 9 years ago

OK, I hope it is clearer now: https://github.com/kframework/k/wiki/Manual-(to-be-processed).

bmmoore commented 9 years ago

I was thinking about the case with separate KLabels when I wrote that. I think the worst parse errors are silently getting an entirely different label than you had in mind, so it's more important that any disambiguation between different KAST terms be understandable an predictable rather than being able to silently disambiguate as many inputs as possible.

If + is used both for primitive integer addition and for addition expressions of a language, I'd rather A+B simply be reported as an ambiguity rather than resolved by considering whether Int < Exp, especially considering that it's easy to make only one interpretation legal at all with a little more annotation, either `A+B::IntorA::Exp+B`.

I think comparing inferred sorts is too subtle - as far as I can see, predicting the result requires the user to consider every possible parse, work out the results of sort inference on those parses, and compare those sorts.

grosu commented 9 years ago

@bmmoore I agree that these are annoying errors, we've all been through them. However, on the other hand, I think it is hard to draw a line in what regards inference, and then say up to the line it is acceptable, beyond the line it is not acceptable. Such a line may end up being subjective. For example, one could say that even inferring the sort of a variable as the intersection sort of its various uses in a rule may be too much. Or like in Maude, where a variable can only have one sort, either declared globally or otherwise in each occurrence in the rule. I am pretty sure that we converged, after many discussions, that picking "the largest user sort" as a principle for inference was not too bad. It is easy to remember, and it applies uniformly to all terms, including variables.

I really hope that we will soon have some IDE support for K, where you hover the mouse over a symbol and it shows you the sort of the term that symbol is a root of (like the generated PDF does for variables, but extended to entire terms), and also the K AST of it. And obviously, we want a similar textual feature, some command which takes a module name and a string, and parses that string within that module and shows you its KAST.

Additionally, the above general principle encompasses more restrictive ones, like the one you suggested. People who don't like the tool to infer things for them can specify sorts as you showed and then they take no chances and get even faster parsing. A similar story happens in functional languages, where some people prefer to explicitly mention the type of their term just in case, doesn't it?

bmmoore commented 9 years ago

At some point how much inference you want becomes a matter of judgement, but functional languages generally at least aspire to provide a simple and principled guarantee like principle types.

In K the equivalent would be something like promising that if type inference succeeds on a pattern and you add more type annotations (in a legal way), then any term matching the annotated pattern would also match the version of the pattern that was filled in by type inference.

bmmoore commented 9 years ago

(temporarily closed only because I accidentally hit "Close and comment")

grosu commented 9 years ago

@radumereuta , a reminder that we cannot close this issue until you digest its contents and update the manual accordingly. When you close this, please make sure that there is nothing important or interesting in this issue that is not already discussed in the manual.

Everybody: from here on, please make an active effort to add things to the manual, especially things that you didn't know exactly, or had or still have doubts about, but learned them from such discussions in github issues. Before closing any issue, please make sure that there is no idea, discussion or lesson learned in the issue that is not mentioned in the manual. It is OK to even just copy and paste from the issue into the manual, because we will clean the manual later.

grosu commented 9 years ago

@bmmoore Thanks for your comments! Is there anything else we discussed which is not said on this page or in the manual? If yes, can you please add it?

@radumereuta Please give us an ETA on this. If you think you cannot or would rather not do it yourself, then please let me know and I'll do it (I actually want to do this myself, but at the same time I believe it is best if you go through all these discussions, digest them, and then write it up yourself to your liking).

bmmoore commented 9 years ago

I don't have any more comments. I'll just mention for better cross-referencing that the issues referenced referenced above are fixed in #1604. I think we are now doing exactly what we want with variables. What's not implemented is comparing sorts to disambiguate when a substring has multiple parses headed with different labels.

grosu commented 9 years ago

@radumereuta : we need your help here. At this moment I'm puzzled about the actual meaning of sort annotations: <:, :>, etc. Even without modules, I'd like to understand how these allow people to choose the exact parsing they want when they have overloaded productions (but with different labels). For example, consider the grammar:

    syntax A ::= "a"
    syntax B :: A
    syntax C ::= f(A)  [klabel(fA)]
               | f(B)  [klabel(fB)]

I want to annotate f(a) so that I can parse it as either fA(a) or fB(a). The second seems to be doable with f(a:>B). If we choose to infer the most concrete operation as default, then we have a conflict with the maximal sort principle.

Please let us know when we should expect to hear from you. If you'd rather not worry about these things anymore (and instead focus on writing your thesis), please let us know it, so we can find a solution on our own.

grosu commented 9 years ago

Let's make the example a bit more interesting, so that neither a minimal sort nor a maximal sort inference convention can work:

    syntax A ::= "a"
    syntax B :: A
    syntax C ::= f(A)  [klabel(fAC)]
               | f(B)  [klabel(fBC)]
    syntax D ::= C
               | f(A)  [klabel(fAD)]
               | f(B)  [klabel(fBD)]

The challenge is to manually sort-annotate f(a) so that we can choose any of the four parses. The only obvious one is f(a:>B)<:C for fBC(a). Whatever convention we come up with, such as maximal or minimal inferred sort, will give us another parse. So we'll be left with two others.

grosu commented 9 years ago

Maybe we should infer the default parsing disambiguation convention based on a simple "escape" mechanism that we want to crisply explain our users in two lines. For example, this can be:

If in a given context C[f(t1,...,tn)] you want to choose a parse with syntax S ::= f(S1,...,Sn), then annotate it as follows: C[f(t1:>S1,...,tn:>Sn)<:S].

The uses of :> and <: above are as local as possible, imposing constraints only on f, and not on its arguments or on the context. In practice, most likely people will write the stronger annotation C[f(t1::S1,...,tn::Sn)::S], which is even simpler to remember.

However, if we have any other f defined on supersorts of S1, ..., Sn and returning a subsort of S then we are in trouble, as we have an ambiguity. The only convention that eliminates that ambiguity is to choose the f which has the minimal sorts for the arguments t1:>S1, ..., tn:>Sn of f, that is, respectively, S1, ..., Sn, and the maximal sort of the result of f(t1:>S1,...,tn:>Sn)<:S, that is, S.

At first sight this seems to not work, because if you nest such symbols then you end up requiring both a maximal and a minimal sort for the same term. No! The above is not about picking sorts for terms, but about picking a certain production in a certain ambiguous situation. Once you pick the production with the property above, the sorts of the argument terms are determined. However, in (probably very rare) situations where ambiguities are nested, the users can use more :> and <: to parse things to their liking, for example f(t1:>S1,...,tn:>Sn)<:S:>S2, etc.

So the above convention seems to work. Let me formalize it:

Default parsing convention: For any term T of the form f(T1,...,Tn) pick the parse where f makes the sorts of T1, ..., Tn the smallest in the set of their possible corresponding sorts and the sort of T is the largest in the set of its possible sorts. If no such smallest or largest sorts exist, then ambiguity.

For example, writing I1 + I2 => 'I1 + I2::Intpicks the first+the one onExpand the second one the one onInt`.

Now we can finally prove the following:

Theorem: Let G be any grammar and let G' be its K extension. Then for any term T that admits a parse tree P in G, there exists a term T' that admits a unique parsing P' in G' such that erasure(T') = T and erasure(P') = P.

grosu commented 9 years ago

By "K extension" I did not mean to add all the K stuff, but only the minimal extension that allows arbitrary syntax annotations, namely:

1) Adding sorts Top and Bot to G

2) For each original sort S of G, add the following:

syntax Top ::= S
syntax S ::= Bot
syntax Bot ::= S "<:S"
syntax S ::= Top ":>S"

We can also use ::S as a shorthand for <:S:>S, or defined directly as

syntax S ::= S "::S"

Actually, we may want to start with just ::, and formalize and prove the result above in this simpler setting that many other tools may adopt. It is basically a simple mechanism to annotate terms in any grammar to allow people to disambiguate however they wish.

Then, introduce the more general setting with <: and :> as needed in K, and reprove the result.

This is definitely something that needs to be written up and published. Not because it is complicated, but because it is simple and practical, it seems.

cos commented 9 years ago

Using the "select a production/klabel" mechanism described on the modularization discussion issue (#1647), the four variants would be obtained by:

f(a)~fAC
f(a)~fBC
f(a)~fAD
f(a)~fBD

The obvious advantage of this approach is that it is much easier and straightforward. There are two disadvantages tho:

  1. The user has to know about the existence of klabels. But we're already talking about rather funky sort annotations, so it shouldn't be that much harder. I would argue the above is much easier to understand.
  2. As this is essentially directly selecting a specific production instead of specifying bounds, it is more likely to break when the productions are changed. Still, we have no guarantee the sort-based solution wouldn't break either (just that it will break in fewer cases), and, as the klabel solution is much easier to understand, it should also be easier to fix.
cos commented 9 years ago

(fixed typo)

grosu commented 9 years ago

I have several problems with your notation:

f(a)~fAC
f(a)~fBC
f(a)~fAD
f(a)~fBD

First of all, it is yet another notation. We have already agreed that we want <: and :> and their combination ::. Are you arguing against these? Please say Yes or No, so know the scope of the discussion. Why have yet another notation, when we can obtain what we want with the existing notation? What I essentially proposed above is that users write the above four cases as follows:

f(a::A)::C
f(a::B)::C
f(a::A)::D
f(a::B)::D

No need for KLabels and everything is crystal clear. How can these be complicated?

The obvious advantage of this approach is that it is much easier and straightforward.

Than what? For whom? In addition to having to use KLabels, which is unacceptable for many reasons already explained, it also requires a new notation. My proposal is essentially a fix for the notation that we already have and many people use. My proposal may look complicated at first sight because I provided all the technical details, including definitions and formal theorem. If I were to summarize the disambiguation principle I proposed in one sentence, it would be "largest sort for result, smallest sort for arguments".

cos commented 9 years ago

(also answering the issues raised in the other thread)

First of all, it is yet another notation. We have already agreed that we want <: and :> and their combination ::. Are you arguing against these? Please say Yes or No, so know the scope of the discussion.

Yes.

Why have yet another notation, when we can obtain what we want with the existing notation?

Because the klabel notation/mechanism is powerful enough to replace both the :>/::/<: and the "parse in module with visibility"/@ mechanisms. We would still need some way of saying "is a subsort of" for tokens, but that's about it.

No need for KLabels and everything is crystal clear. How can these be complicated? Than what? For whom?

For me at least it is not crystal clear. Well, it is now as I'm thinking about it specifically, but when I stumble upon a parsing ambiguity when reasoning about my own problem, I don't want to switch contexts and try to make sense of nesting of largest/smallest or module visibility. I want something that can get the ambiguity out of my way as quick as possible. klabels do that.

@cos I don't see how t ~ P @ M is any simpler than what we proposed already. Besides, it requires the use of KLabels, which is a NO. As mentioned, since K is a concrete syntax language, it should be able to deal with things without resorting to labels. If we are to ask people use K labels then there is no discussion whatsoever, because people can just write P@M(t1,...,tn), which they can already do, so no need for another fancy notation.

In the age-old debate of concrete vs labeled syntax, I am very much in favor of concrete. Not for some philosophical reason, but a stylistic one. I really like writing if A then B else C instead of if_then_else_(A, B, C). I would be willing to write if_then_else_(A, B, C)~myFunkyConditional when an ambiguity occurs, but I probably won't have the energy to fight the sort system for it when I'm in the middle of something else -- it is essentially a question of copy-pasting a klabel versus figuring out some sort relations. Furthermore, because we have the concrete syntax, we can drop the klabel all-together when we want to filter by module (like in the solution for Brandon's example: a(b~C)`~B` -- we may wanta(b~@C)~@B or the shorter `a(b@C)@B` to avoid mixing modules and klabels, though I guess it is unlikely to get many clashes).

Again, this is my subjective perspective. I would be curious to hear from people who have written large K semantics.

Moreover, you may have overloaded KLabels in the same module (e.g. when you have lists), so you will not be able to refer to a specific production using its KLabel. You can add tags instead and make sure they are unique, but again, I do not find this any simpler.

Actually, this is the easy part. First, when adding modules, there will be very few overloaded klabels. In those cases, it is only an issue if the productions for the overloaded klabel have identical syntax. And when they do, it is enough to add an annotation to one of the arguments (no need for extra tags), ending up with something at most as complex as the sorting approach (notice that f(a::A)::C has essentially two annotations while f(a)~fAC has one, but we can construct an example where the labeling approach also requires two; we could even construct an example where the labeled approach needs more annotations, but I don't think it is something we could easily find in practice).

So, the decrease in complexity stems from two aspects:

  1. klabels are unrelated so we don't even have the possibility of talking about relations between klabels, so any disambiguation will need to be done structurally (i.e., by labeling another part of the term)
  2. There are many more klabels than sorts. So their power of selecting a specific production is greater.

Going further, a practical argument would be that implementing the labeled approach would be much easier than the "parse in module with visibility" (and whatever changes to ::/:>/<: would be needed. Is that fully functional?), but I may be wrong. @radumereuta and @dwightguth know much better.

Regarding formalism, I think the proposal is simple enough. I will copy-paste the description from the other thread:

I propose the following mechanism: t ~ P @ M means parse term t using production P from module M at the top level. If there is no ambiguity, we also allow t ~ P and t ~ M, with the latter trying to find some production in M that fits.

As my proposal is as powerful as labeled form (it is essentially labeled form with nicer syntax and a bit of flexibility in only specifying the klabel or the module), I think the "K extension" theorem above trivially holds.

Finally, I understand that my proposal is much less cool than reasoning about subsorts, and less pure as it involves klabels. My only argument is that it will be easier to use. If it is not accepted, labeled form will still be there for when we get tangled in sorts or for disambiguating between modules -- it will just be uglier syntactically, especially when dealing with modules. If anyone likes my proposal or thinks anything is salvageable from it, please speak up. If the proposal doesn't gather any support, I will bow to save everybody's time.

grosu commented 9 years ago

Let me repeat my proposal, from a user perspective: nothing new!

That is, use T::S to say that both T and the context (hole) have sort S, T<:S to say that T has sort S without constraining the context, and :> to say that the context has sort S without constraining T. This is already implemented and it works, although the current implementation fails the completeness property (it cannot disambiguate some terms). What I proposed was a simple fix to the implementation, which also allows us to prove the desired completeness theorem.

So I would not trade the above, a provably correct solution which served us well so far, for something new and unexpected, which we did not get a chance to test enough. We need time to test new ideas and, besides, it will look strange to people who are already using these to see changes in the design of the user-visible notation from one version of K to another.

Further, I don't think that we can avoid the use of <: and :>. For example, we need to allow people to write things of the form Term :> Sort, because Term can be of a sort different from Sort and the context requires something of sort Sort there. We have examples of this (basically where you know for sure that Term will evaluate to something of sort Sort, but that is not obvious syntactically). I don't see how your notation can help here, because the problem is not ambiguity, but relaxation of parsing. Similar arguments can be brought for Term <: Sort, that is, I need to parse Term to Sort in order to eliminate ambiguities in Term only, but I want to put that in a context that expects something of a different sort.

I don't think your proposal can allow the above without further notations and conventions. And even if it did, it cannot be any simpler than the direct operations that allow you to tell exactly what you want, at the same time requiring no changes in notation from our users.

Besides, as you said, your approach needs a special notation for "is a subsort of" for tokens, so yet another thing to be invented.

Besides, it is intimately dependent on our particular and fragile naming conventions for default KLabels. This can and will change, and we do not want people to have to change their definitions when we do that.

I cannot make sense of your paragraph starting with

Actually, this is the easy part. First, when adding modules ...

Why do you bring modules in this discussion? As said, let us first make things crystal clear without modules, and then worry about modules. Let's say, for the time being on this issue page, that we only have one module.

f(a::A)::C has essentially two annotations while f(a)~fAC has one

Common, f(a::A)::C is crystal clear (we can explain it to a new user in 5 seconds) and does not depend on people defining KLabels or on our own fragile KLabel name conventions, or on the fact that T ~ P is a sugared syntax (for T ~ P@M).

cos commented 9 years ago

Why do you bring modules in this discussion? As said, let us first make things crystal clear without modules, and then worry about modules. Let's say, for the time being on this issue page, that we only have one module.

Because unless we put the solution ahead of the problem, modules are relevant. Both this discussion and the module thread address the exact same underlying problem: given a space of possible parses, trim it down to just one (the only extra in this discussion is relaxing the parsing, i.e., increasing the space of possible parses). Our annotations, whether they refer to sorts, klabels, or modules are mechanisms for achieving this.

I don't think your proposal can allow the above without further notations and conventions.

Yes, my proposal only filters possible parses. It has no mechanism for allowing otherwise-invalid parses.

Besides, as you said, your approach needs a special notation for "is a subsort of" for tokens, so yet another thing to be invented.

It is essentially ::.

Besides, it is intimately dependent on our particular and fragile naming conventions for default KLabels. This can and will change, and we do not want people to have to change their definitions when we do that.

There are ways around this, like raising warnings when users refer to automatically-generated klabels.

In any case, I see nobody else pitched in so I think we are converging towards the sort-based solution for this discussion. Aside from being a bit hard to reason about for the user, I have nothing against the <: and :> sort annotations.

grosu commented 9 years ago

Guys, the "ol' reliable" @bmmoore just destroyed my beautiful result, with the following counter-example that he pulled from his magic hat:

A ::= "x" "y"
    | "x" B
B ::= "y"

Now consider x y. While we can always use backricks to enforce structure, we do not have a mechanism to enforce lack of structure. For example, I can write xyto enforce the second production, but there is no way to enforce the first one. Writingx y<:A or `x y::A` does no good.

So, completely orthogonally to sorts, we need another convention: Give priority to atomic productions in K brackets. That is, x y gives priority to the first production. Why? Because if one means more structure, one can always add using nested backticks.

Note that the above does not prevent us from issuing warnings in such situations nevertheless, which I believe is the right thing to do. However, on the other hand, we do fundamental research and we must put our hands on such a "unique parsing" theorem. With the above, I believe we have it. But ... the proof will be tricky!

bmmoore commented 9 years ago

The general idea there is that the :: kind of sort annotations plus @grosu's convention seem to be enough to sort out ambiguities where you know the shape of the parse tree at a node and are just deciding what production/klabel to use, while this x y example comes from looking at ambiguities where you don't even know the shape of the tree. Then you realize you can force things to parse as subterms by adding more bubbles, but you have to worry about ambiguities going the other way.

There are two more issues to finish this off.

One is tokens. Unlike sequences of (possibly regexp) terminals the actual string matters, so the definition of "duplicate" productions has to be adjusted. I think just saying a token sort can't have subsorts or non-token productions is enough? Then `stuff::TokenSort` would be unambiguously parsed.

Another question is how to avoid the "::Sort" syntax itself being ambiguous with the user grammar. We need to go a bit farther than just requiring that they not duplicate the Sort ::= Sort "::Sort" production, which wouldn't stop the user from writing X ::= Y "::Sort". (If we just want to get a unique parsing theorem by brute force, we could build the annotation into the bubble syntax by saying you have to write the annotation immediately after the close quote, as in ...`::Sort`, and if you don't want an annotation you have to follow the close quote with some other character, say `~`, which will be eaten as part of the bubble syntax, so....~ for an unannotated bubble).

As a minor alternative to preferring the atomic parse but giving a warning (getting some combination of defaulting users don't want, and an escape convention that can't be used without getting a warning), we could invent some kind of annotation on a bubble to say that something should be parsed more atomically, maybe like `x y!`.

bmmoore commented 9 years ago

@cos, leaving out modules makes sense while working out whether it's even possible in principle to disambiguate any grammar with brackets and annotations. I think modules do not add any new kinds of ambiguity (you can just rename klabels and sorts to maintain the same distinctions namespacing gives us), and unless they do then solutions which don't rely on annotations involving modules are more general.