Open Scott-Guest opened 10 months ago
I think the check probably needs to exclude productions that are hooked (see https://github.com/runtimeverification/k/blob/master/k-distribution/include/kframework/builtin/domains.md#meta-operations):
syntax {Sort} Sort ::= #parseKORE(String) [function, hook(KREFLECTION.parseKORE)]
(although, that production seems to be the only one in domains.md
that has the sort parameter only on the LHS, so there might be something we can do here - especially given how much of a nasty escape hatch #parseKORE
is)
I see two options regarding #parseKORE
:
Allow functions to have a LHS-only parametric sort. If we allow this, I don't think we need to restrict it to hooked functions - a user may want to build up their own such functions using the hooked ones.
Change the signature to syntax K ::= #parseKORE(String)
, requiring users to manually insert projection casts to use the result.
I personally am in favor of changing the signature. This is a breaking change, but likely a small one given that #parseKORE
is not widely used.
By doing so, we ensure that
#parseKORE
returns a value that is ill-typed in the context.Yep, on reflection I think let's change the interface of parseKORE and require projections. It should require some thought if you want to break out of the type system like that!
Worth also auditing the more internal parts of the standard library to make sure we haven't missed any other examples, but then if that's all good then let's go ahead and implement the frontend restriction.
It appears that a number of the matching logic symbols also have parametric sorts only on their LHS.
I don't understand these symbols well-enough to know how to handle this. Let's discuss in the meeting on Tuesday.
An attempt at a list of symbols with LHS-only parametric sorts:
#parseKORE
KBott
#Top
#Bottom
#True
#False
#Ceil
#Floor
#Equals
Without spending too much time working through it, we might well be able to figure out a solution with the rough structure:
#parseKORE
(or even just deprecate it!); as far as I can tell it's only used in a couple of places in old semantics that predate Pyk. Those semantics use it to manually run the generated Bison parser for a definition and ingest (unsafely) a KORE term from a file; that specific use-case should now be implemented via Pyk.forall Sort . KBott <: Sort
? I presume that this production is just required to instantiate the bottom of the sort lattice as K syntax.#parseKORE
and KBott
, we can at least either do some frontend enforcement to keep the existing LHS-parametric syntax for ML productions while disallowing it elsewhere in user code.@Scott-Guest and @radumereuta discuss KBott
Otherwise special-case for ML symbols; otherwise disallow LHS polymorphism
@Baltoli @dwightguth write up the issue with #parseKORE
not working in sort K
.
If we can't add new constructors to hooked sorts, we shouldn't allow syntax {S} S ::= ...
where the RHS is a constructor at all (regardless of whether S
appears on the RHS or not).
Between kast.md
and domains.md
, it looks like we currently use parametric constructors for
=>
amb
#location
#let
#fun
#as
If I understand correctly, the reason we don't allow adding new constructors to hooked sorts is that they can cause the hooks to crash at runtime. Thus, these constructor are only okay in so far as we guarantee they will be evaluated away before being passed to a hook.
A few questions then
#amb
and #location
where users can write their own rules over these symbols?total
?syntax {S} S ::= ...
may unintentionally be adding a back door?Per @dwightguth:
#location
is relatively easy to fix because it is enabled via an attribute applied to specific sorts so you should be able to autogenerate a non parametric symbol with a specific name for each such sort and then emit the corresponding one in the Bison code
amb
will be harder because there is nothing theoretically preventing an ambiguity from occurring over a sort like Bool that would cause a crash, we just simply don't guarantee that everything will actually work correctly if such occurs
Given that this will require fairly involved fixes for #location
and #amb
, I'll implement the check but leave them as special-cases for now.
In the long run, we should try to address them though.
The current plan is to disallow syntax {S} S ::= ...
declarations except for
amb
and #location
amb
only used by the C semantics (https://github.com/search?q=org%3Aruntimeverification+amb%28&type=code)#location
only used by the C and Algorand semantics (https://github.com/search?q=org%3Aruntimeverification+%23location&type=code)As a result, every constructor's sort parameters only appear on the RHS. However, any sort parameter that only appears on the RHS can without loss of generality be realized as K
, e.g.,
syntax {S} Foo ::= foo(S)
is equivalent to syntax Foo ::= foo(K)
as both accept any sortsyntax {S} Foo ::= foo(S, S)
is equivalent to syntax Foo ::= foo(K, K)
as the sort parameter can always be widened until they are equalThere is a slight difference in that, given Sub <: Super
and subVal() : Sub
,
foo(subVal():Sub)
and foo(subVal():Super)
are equal with syntax Foo ::= foo(K)
foo(subSortVal():Sub)
and foo(subSortVal():Super)
are distinct with syntax {S} Foo ::= foo(S)
because they have different constructors (foo{Sub}
vs foo{Super}
).In my opinion, this is good and desirable! The fact that foo{SubSort}
and foo{SuperSort}
are distinct is confusing and unintuitive given K's pervasive subtyping, e.g., it's surprising that a rule like
syntax Sub ::= SubVal()
syntax Super ::= Sub
| SuperVal()
syntax {S} S ::= identity(S)
rule identity(Z:Super) => Z
does not apply to the term identity(SubVal())
as we discovered in #3833. If you do really need them to be distinct, then it's clearer to just manually implement every monomorphization as distinct constructors
syntax Super ::= identitySuper(Super)
syntax Sub ::= identitySub(Sub)
With this change then, we can restrict the syntax so that (for constructors) sort variables only occur as sort parameters, e.g.,
syntax Foo{S} ::= foo(S, S)
Coincidentally, this restriction is also a conceptual restriction shared by the haskell backend!
dwight.guth 3 years ago @tom.tuegel how complicated would it be to allow the Haskell backend to support parametric constructors? I seem to recall hearing you say something in the past that indicated you didn't want to go this route, but I wanted to bring it up because there are several cases during parsing where a parametric constructor substantially simplifies the implementation, and since the LLVM backend already supports this, I wanted to sound you out
tom.tuegel 3 years ago It should not be a significant difficulty to support parametric constructors, with some restrictions. As long as there is a definite sort constructor with the same parameters as the constructor symbol, it should be fine, that is: symbol Ctor{R, S, ...}(...) : Sort{R, S, ...} [constructor] Declarations like the following are syntactically legal, but logically problematic: symbol Any{S}(...) : S As long as we restrict ourselves to the first form, I think it should be alright.
To me, this a good indication that banning syntax {S} S ::= ...
is morally the right choice.
That is very neat, and especially pleasing that it subsumes that LLVM backend issue (I hadn't quite clocked the link there during the meeting). Nice writeup - thanks @Scott-Guest
Productions
syntax {S} S ::= ...
where the sort parameterS
only appears on the LHS are buggy, and appear to be de-facto unsupported.At a minimum, they violate restrictions that we place on hooked sorts. For example,
is disallowed because you cannot add new constructors or subsorts to hooked sorts, thus
must be disallowed as well. (We don't currently report a nice error here, and instead report unrelated errors or crash depending on the exact example).
We could potentially permit
syntax {S} S ::= "new_token" [token]
orsyntax {S} S ::= foo() [function]
as hooked sorts allow token and function productions.However, I’m unaware of any compelling use cases for this, and they are either unsupported or buggy in practice.
For example,
results in
Given these issues, I propose that we just report an error in the frontend on any such production.