runtimeverification / k

K Framework Tools 7.0
BSD 3-Clause "New" or "Revised" License
444 stars 145 forks source link

Support for syntactic simplifications #4579

Open PetarMax opened 1 month ago

PetarMax commented 1 month ago

To improve the simplification mechanism of the backend, we require the addition of a syntactic attribute in simplifications, which takes a list of positive integers. These integers would then indicate to the backend that the corresponding clauses in the requires should be checked only syntactically. Two examples of this would be:

rule X <=Int Y => X <Int Y
  requires notBool (X ==Int Y)
  [simplification, syntactic(1)]

and

rule X <=Int Y => true 
  requires X <=Int Z
   andBool Z <=Int Y
  [simplification, concrete(Y, Z), syntactic(1)]

Note that, in the second example, there is a symbolic variable Z introduced in the first clause that is not present in the LHS. This appears to be supported by the front-end if the corresponding module is marked as [symbolic]. Optionally, we would like such variables to be supported also if the clause in they first appear is marked as syntactic.

Scott-Guest commented 1 month ago

@PetarMax Not familiar here - what explicitly counts as a clause / how is the numbering of clauses precisely defined?

PetarMax commented 1 month ago

I am thinking of simplifications as follows:

rule LHS => RHS
  requires C1 andBool C2 andBool ... andBool CN
  [simplification]

where the Ci are the clauses, separated by andBool, and numbered in order in which they appear in the simplification.

geo2a commented 1 month ago

I've implemented a prototype of this feature in Booster (https://github.com/runtimeverification/haskell-backend/pull/4022). I've manually modified a definition.kore file and added syntactic{}("1") to the attributes of a simplification.

I have one question for @Scott-Guest: do you know if the K compiler preserves the requires clause as-is when compiling surface K to kore? This is important, as we are relying on the ordering of conjuncts.

And another question for @PetarMax: what behavior you expecting when there are more than one syntactic clause? I.e. syntactic(1, 2). I'd expect we process them from left to right, i.e. if the first one binds all variables from the second, than the second will have no effect and the result will be equivalent to syntactic(1). Does that seem reasonable?

PetarMax commented 1 month ago

Left to right, yes please.

Scott-Guest commented 1 month ago

@geo2a: do you know if the K compiler preserves the requires clause as-is when compiling surface K to kore?

Hmm, that's a good point - regardless of the current situation, it's very fragile to rely on requires clauses never being manipulated, and I think we should consider an alternative syntax which doesn't require this assumption.

At a minimum, we already do constant folding which could collapse the number of clauses.

I'm not that familiar with the symbolic backends, so I don't know if this is feasible, but what about adding a special symbol? Something like

syntax Bool ::= syntactic(Bool) [symbol(syntactic)]

to mark

rule X <=Int Y => true 
  requires syntactic(X <=Int Z)
   andBool Z <=Int Y
  [simplification, concrete(Y, Z)]

@PetarMax thoughts?

PetarMax commented 1 month ago

As somebody who writes rules, I would expect to be able to refer to the specific clauses of the rule, in the specific order in which I write them. Before you start processing the clauses, you should be able to tag them and then process them tagged. The rules marked for syntactic matching will not disappear if written correctly by the user, and if written incorrectly... 🤷‍♂️ . For example, starting from:

rule X <=Int Y => true 
  requires X <=Int Z andBool T <=Int W andBool W <=Int T
  [simplification, concrete(Y, Z), syntactic(1, 3)]

you should be able to tag the first and third clause with an attribute of some kind indicating ordering in syntactic matching, say

{ X <=Int Z; syntactic: 1 }
{ T <=Int W }
{ W <=Int T; syntactic: 2 } <--- note that here is a 2, indicating that this clause is the second to be matched

and then work with them as such, permute, transmute, whatever you like. The backend would not have to receive them in order or think about order. It's not clear to me how your syntax would preserve the ordering in which the syntactic matching is to happen - perhaps we can work on that?

Scott-Guest commented 1 month ago

@PetarMax

The rules marked for syntactic matching will not disappear if written correctly by the user, and if written incorrectly... 🤷‍♂️

My concern is that, as this attribute was originally proposed, "correctly" here means "in a way which won't trigger certain compiler optimizations / manipulations". Just on principle, we shouldn't be exposing compiler implementation details to the user - it can be finicky to reason about even in the relatively simple case of constant folding, and prone to breakage if the user does refactoring or if compiler internals change.

Before you start processing the clauses, you should be able to tag them and then process them tagged.

I definitely agree. The finickiness I mention above is solved by recording the information directly onto the expression itself, rather than non-locally as external meta-data that needs to be kept in-sync if the compiler manipulates the expression.

However, KORE only supports attributes on sentences / declarations, not on sub-terms. Regardless of the difficulty of adding tagging infrastructure for sub-terms to the front-end (which AFAICT would require auditing the whole frontend pipeline), at the end of the day, our only option is to either change KORE or emit an actual symbol similar to what I proposed.

We then have a few options: 1) The user directly writes this tagging symbol in the source code 2) The user writes a syntactic(_) attribute on the rule which refers to clauses in that rule. The compiler internally places tagging symbols on the referred-to terms, then we either a) Use these tagging symbols to reconstruct the syntactic(_) attribute before re-emitting to KORE b) Just have the backends directly consume these tagging symbols

2b) seems preferable over 2a). Otherwise, the backend needlessly has to care about ordering / do work to reconstruct info we already had in the frontend.

I'm less certain about 1) versus 2b), and would be interested in your thoughts.

It's not clear to me how your syntax would preserve the ordering in which the syntactic matching is to happen - perhaps we can work on that?

Ah I missed this part. We could update the syntax to also include an Int argument indicating the ordering (or priority) for syntactic matching:

syntax Bool ::= syntactic(Bool, Int) [symbol(syntactic)]

Alternatively, we could even more broadly add syntax to name a sub-term like

syntax Bool ::= named(Bool, String) [symbol(named)]

where relevant attributes can then refer to names

rule X <=Int Y => true 
  requires named(X <=Int Z, foo) andBool T <=Int W andBool named(W <=Int T, bar)
  [simplification, concrete(Y, Z), syntactic(foo, bar)]

Do you agree with my assessment? What're your thoughts on 1) vs 2b)?

PetarMax commented 1 month ago

Yes, I agree. I would be happy with syntactic(Bool, Int), it's quite flexible. The String version, I think, loses the ordering again, unless it'd be lexicographic?

Scott-Guest commented 1 month ago

The idea with the String version is that

PetarMax commented 1 month ago

Yes, ok, understood. Do we have use cases other than syntactic? Could other current features be re-expressed like this? If not, then I'm more inclined to just go for syntactic.

Scott-Guest commented 1 month ago

Looking through the existing attributes, it doesn't appear broadly applicable - let's go with syntactic(Bool, Int) then.

PetarMax commented 1 month ago

Yes please, thank you! While we-re doing this, could we please also add another simplification attribute, no-smt?

rule LHS => RHS requires ... ensures ... [simplification, no-smt]

Its meaning in the backnd would be that during its application we should not use the SMT solver.

geo2a commented 1 month ago

Thanks for the in-depth analysis @Scott-Guest and @PetarMax!

We've discussed the proposal in the Haskell backend daily meeting with @jberthold and @goodlyrottenapple.

Our consensus is that using inner syntax to drive a backend feature does not align well with K's design. With making a symbol like named or syntactic special, we are blocking semantic writers from using this symbol for their own purposes. I'd argue that such special casing is unacceptable for a semantics framework.

Instead, the information regarding special treatment of parts of inner syntax should flow to the backend via attributes. This is already done for many cases such as the symbol attribute, the concrete and symolic attributes etc. The marking of clauses as syntactic should be no different.

What happens inside the compiler is of course another story: we can have any sort of special-casing there, and we likely already do. With that in mind, one of the approaches @Scott-Guest suggests:

The user writes a syntactic() attribute on the rule which refers to clauses in that rule. The compiler internally places tagging symbols on the referred-to terms, then we either a) Use these tagging symbols to reconstruct the syntactic() attribute before re-emitting to KORE

sounds suitable.

@Scott-Guest what do you think? I'm happy to set-up a meeting to discuss this in more depth.

Scott-Guest commented 1 month ago

@geo2a I don't really agree, but I'm happy to concede to the consensus.

I'll open an initial PR adding syntactic to the attribute whitelist + easing the restriction on unbound variables so you can use it immediately. Just be careful to check the output KORE and confirm the requires clause doesn't get changed by the compiler.

I'll add the tagging under the hood to make it more robust in a follow-up PR.

geo2a commented 1 month ago

Thank you @Scott-Guest!

easing the restriction on unbound variables

I don't think there's anything to do here, as rules like this:

rule X <=Int Y => true 
  requires X <=Int Z
   andBool Z <=Int Y
  [simplification]

are currently accepted and compiled as expected by kompile --backend haskell.

PetarMax commented 1 month ago

Are they accepted even if the module that contains them is not marked as [symbolic]? I think that simplificatios are, in principle, also given to the LLVM backend, unless the module that contains them is marked like that.

Scott-Guest commented 4 weeks ago

I have the PR just adding syntactic to the whitelist (#4597), but need clarity regarding what we want to do for unbound variables.

Presumably, the LLVM backend can't handle these syntactic simplifications, so wouldn't we want to only allow this in symbolic modules as is already the case? Why would we drop that restriction?

PetarMax commented 4 weeks ago

Yes, you are right, they belong only in [symbolic], and that already works. It would appear that we don't need to do anything for unbound variables, right, @geo2a?

geo2a commented 4 weeks ago

No, I do not think we need anything.