Open PetarMax opened 1 month ago
@PetarMax Not familiar here - what explicitly counts as a clause / how is the numbering of clauses precisely defined?
I am thinking of simplifications as follows:
rule LHS => RHS
requires C1 andBool C2 andBool ... andBool CN
[simplification]
where the Ci
are the clauses, separated by andBool
, and numbered in order in which they appear in the simplification.
I've implemented a prototype of this feature in Booster (https://github.com/runtimeverification/haskell-backend/pull/4022). I've manually modified a definition.kore
file and added syntactic{}("1")
to the attributes of a simplification.
I have one question for @Scott-Guest: do you know if the K compiler preserves the requires
clause as-is when compiling surface K to kore? This is important, as we are relying on the ordering of conjuncts.
And another question for @PetarMax: what behavior you expecting when there are more than one syntactic clause? I.e. syntactic(1, 2)
. I'd expect we process them from left to right, i.e. if the first one binds all variables from the second, than the second will have no effect and the result will be equivalent to syntactic(1)
. Does that seem reasonable?
Left to right, yes please.
@geo2a: do you know if the K compiler preserves the requires clause as-is when compiling surface K to kore?
Hmm, that's a good point - regardless of the current situation, it's very fragile to rely on requires
clauses never being manipulated, and I think we should consider an alternative syntax which doesn't require this assumption.
At a minimum, we already do constant folding which could collapse the number of clauses.
I'm not that familiar with the symbolic backends, so I don't know if this is feasible, but what about adding a special symbol? Something like
syntax Bool ::= syntactic(Bool) [symbol(syntactic)]
to mark
rule X <=Int Y => true
requires syntactic(X <=Int Z)
andBool Z <=Int Y
[simplification, concrete(Y, Z)]
@PetarMax thoughts?
As somebody who writes rules, I would expect to be able to refer to the specific clauses of the rule, in the specific order in which I write them. Before you start processing the clauses, you should be able to tag them and then process them tagged. The rules marked for syntactic matching will not disappear if written correctly by the user, and if written incorrectly... 🤷♂️ . For example, starting from:
rule X <=Int Y => true
requires X <=Int Z andBool T <=Int W andBool W <=Int T
[simplification, concrete(Y, Z), syntactic(1, 3)]
you should be able to tag the first and third clause with an attribute of some kind indicating ordering in syntactic matching, say
{ X <=Int Z; syntactic: 1 }
{ T <=Int W }
{ W <=Int T; syntactic: 2 } <--- note that here is a 2, indicating that this clause is the second to be matched
and then work with them as such, permute, transmute, whatever you like. The backend would not have to receive them in order or think about order. It's not clear to me how your syntax would preserve the ordering in which the syntactic matching is to happen - perhaps we can work on that?
@PetarMax
The rules marked for syntactic matching will not disappear if written correctly by the user, and if written incorrectly... 🤷♂️
My concern is that, as this attribute was originally proposed, "correctly" here means "in a way which won't trigger certain compiler optimizations / manipulations". Just on principle, we shouldn't be exposing compiler implementation details to the user - it can be finicky to reason about even in the relatively simple case of constant folding, and prone to breakage if the user does refactoring or if compiler internals change.
Before you start processing the clauses, you should be able to tag them and then process them tagged.
I definitely agree. The finickiness I mention above is solved by recording the information directly onto the expression itself, rather than non-locally as external meta-data that needs to be kept in-sync if the compiler manipulates the expression.
However, KORE only supports attributes on sentences / declarations, not on sub-terms. Regardless of the difficulty of adding tagging infrastructure for sub-terms to the front-end (which AFAICT would require auditing the whole frontend pipeline), at the end of the day, our only option is to either change KORE or emit an actual symbol similar to what I proposed.
We then have a few options:
1) The user directly writes this tagging symbol in the source code
2) The user writes a syntactic(_)
attribute on the rule which refers to clauses in that rule. The compiler internally places tagging symbols on the referred-to terms, then we either
a) Use these tagging symbols to reconstruct the syntactic(_)
attribute before re-emitting to KORE
b) Just have the backends directly consume these tagging symbols
2b) seems preferable over 2a). Otherwise, the backend needlessly has to care about ordering / do work to reconstruct info we already had in the frontend.
I'm less certain about 1) versus 2b), and would be interested in your thoughts.
It's not clear to me how your syntax would preserve the ordering in which the syntactic matching is to happen - perhaps we can work on that?
Ah I missed this part. We could update the syntax to also include an Int
argument indicating the ordering (or priority) for syntactic matching:
syntax Bool ::= syntactic(Bool, Int) [symbol(syntactic)]
Alternatively, we could even more broadly add syntax to name a sub-term like
syntax Bool ::= named(Bool, String) [symbol(named)]
where relevant attributes can then refer to names
rule X <=Int Y => true
requires named(X <=Int Z, foo) andBool T <=Int W andBool named(W <=Int T, bar)
[simplification, concrete(Y, Z), syntactic(foo, bar)]
Do you agree with my assessment? What're your thoughts on 1) vs 2b)?
Yes, I agree. I would be happy with syntactic(Bool, Int)
, it's quite flexible. The String
version, I think, loses the ordering again, unless it'd be lexicographic?
The idea with the String
version is that
named(Bool, String)
just provides a name to a sub-term. Any attribute could then use this name to refer to that sub-term, and this isn't specific to anything about syntactic(_)
.syntactic(_)
would take a list of names for sub-terms. Just like with syntactic(1, 2)
, the order in the attribute specifies the ordering between the clauses - the only difference is we refer to those terms by name rather than number.Yes, ok, understood. Do we have use cases other than syntactic
? Could other current features be re-expressed like this? If not, then I'm more inclined to just go for syntactic
.
Looking through the existing attributes, it doesn't appear broadly applicable - let's go with syntactic(Bool, Int)
then.
Yes please, thank you! While we-re doing this, could we please also add another simplification attribute, no-smt
?
rule LHS => RHS requires ... ensures ... [simplification, no-smt]
Its meaning in the backnd would be that during its application we should not use the SMT solver.
Thanks for the in-depth analysis @Scott-Guest and @PetarMax!
We've discussed the proposal in the Haskell backend daily meeting with @jberthold and @goodlyrottenapple.
Our consensus is that using inner syntax to drive a backend feature does not align well with K's design.
With making a symbol like named
or syntactic
special, we are blocking semantic writers from using this symbol for their own purposes. I'd argue that such special casing is unacceptable for a semantics framework.
Instead, the information regarding special treatment of parts of inner syntax should flow to the backend via attributes. This is already done for many cases such as the symbol
attribute, the concrete
and symolic
attributes etc. The marking of clauses as syntactic should be no different.
What happens inside the compiler is of course another story: we can have any sort of special-casing there, and we likely already do. With that in mind, one of the approaches @Scott-Guest suggests:
The user writes a syntactic() attribute on the rule which refers to clauses in that rule. The compiler internally places tagging symbols on the referred-to terms, then we either a) Use these tagging symbols to reconstruct the syntactic() attribute before re-emitting to KORE
sounds suitable.
@Scott-Guest what do you think? I'm happy to set-up a meeting to discuss this in more depth.
@geo2a I don't really agree, but I'm happy to concede to the consensus.
I'll open an initial PR adding syntactic
to the attribute whitelist + easing the restriction on unbound variables so you can use it immediately. Just be careful to check the output KORE and confirm the requires
clause doesn't get changed by the compiler.
I'll add the tagging under the hood to make it more robust in a follow-up PR.
Thank you @Scott-Guest!
easing the restriction on unbound variables
I don't think there's anything to do here, as rules like this:
rule X <=Int Y => true
requires X <=Int Z
andBool Z <=Int Y
[simplification]
are currently accepted and compiled as expected by kompile --backend haskell
.
Are they accepted even if the module that contains them is not marked as [symbolic]
? I think that simplificatios are, in principle, also given to the LLVM backend, unless the module that contains them is marked like that.
I have the PR just adding syntactic
to the whitelist (#4597), but need clarity regarding what we want to do for unbound variables.
Presumably, the LLVM backend can't handle these syntactic simplifications, so wouldn't we want to only allow this in symbolic
modules as is already the case? Why would we drop that restriction?
Yes, you are right, they belong only in [symbolic]
, and that already works. It would appear that we don't need to do anything for unbound variables, right, @geo2a?
No, I do not think we need anything.
To improve the simplification mechanism of the backend, we require the addition of a
syntactic
attribute in simplifications, which takes a list of positive integers. These integers would then indicate to the backend that the corresponding clauses in therequires
should be checked only syntactically. Two examples of this would be:and
Note that, in the second example, there is a symbolic variable
Z
introduced in the first clause that is not present in the LHS. This appears to be supported by the front-end if the corresponding module is marked as[symbolic]
. Optionally, we would like such variables to be supported also if the clause in they first appear is marked as syntactic.