[KIP] - Lift the Binding Restriction on :=K

JKTKops commented 2 years ago

Motivation

This is the binding restriction on :=K:

However, due to current limitations, these variables are NOT bound in the rest of the term. The user is thus encouraged to use anonymous variables only, although this is not required.

This feature would make it possible to describe View Patterns in K. The link is for haskell but has motivating examples. I am unsure what a special syntax sugar for such patterns would be but it also doesn't matter too much since using :=K is already pretty quiet.

My motivating use case for this is pattern matching on bitstrings. @ehildenb tells me that the Plutus semantics share this use case with me. Consider bitpatterns in erlang. Say I want to write some binary decoding logic for a machine language. In Erlang I can write something like this:

case Encoding of
  << 0:2, Size:2, Opcode:4, RegA:3, RegB:3, 0:2 >> -> dasmRR(Opcode, Size, RegA, RegB)
  << 1:2, Size:2, Opcode:4, RegA:3, Imm:5       >> -> dasmRI(Opcode, Size, RegA, Imm)
  << 4:3, Sgn:1, Opcode:4, Disp:8               >> -> dasmJmp(Opcode, Sgn, Disp)
end.

In Haskell, with a bit of work to define the bits ViewPattern, I can write it like this:

{-# LANGUAGE ViewPatterns #-}

decode :: ByteString -> Maybe Instruction
decode (bits [2,2,4,3,3,2] -> [0,size,op,regA,regB,0]) = dasmRR op size regA regB
decode (bits [2,2,4,3,5]   -> [1,size,op,regA,imm])    = dasmRI op size regA imm
decode (bits [3,1,4,8]     -> [4, sgn, op, disp])      = dasmJmp op sgn disp
decode _ = Nothing

However to define a similar function in K, I have to figure out how to encode the match myself as a combination of &Int and ==Int so that I can "lift" it to the outer level. I can't call a bits function on the input, pattern match on the result, and keep trying other decode patterns if that match fails.

Example K Code

Suppose K allowed us to bind variables with :=K. This function could look something like this:

module DECODE
  imports INT
  imports BYTES

  syntax Ints ::= List{Int, ","}
  syntax BitPattern ::= "<<" Ints ">>"
                      | "bits" "(" "[" Ints "]" "," Bytes ")" [function]
  //omitted for brevity
  //rule bits(...) => ...

  rule decode(BS) => dasmRR(OP, SIZE, RegA, RegB)
    requires << 0, SIZE, OP, RegA, RegB, 0 >> :=K bits([2,2,4,3,3,2], BS)
  rule decode(BS) => dasmRI(OP, SIZE, RegA, RegB)
    requires << 1, SIZE, OP, RegA, Imm >> :=K bits([2,2,4,3,5], BS)
  rule decode(BS) => dasmJmp(OP, SGN, DISP)
    requires << 4, SGN, OP, DISP >> :=K bits([3,1,4,8], BS)
endmodule

It is also worth mentioning that this feature would also enable a simple syntactic sugar for where clauses ala #1685:

  rule foo => bar
    where PAT = baz

translates directly into

  rule foo => bar
    requires PAT :=K baz

This would also allow PAT to be more complicated than a single variable.

Documentation

Remove the sentences about the restriction from the documentation of the pattern matching operator. Under the same section, a new paragraph like this could be added:

Since the pattern matching operator can bind variables, it can be used to implement a generalized form of pattern matching called view patterns. Using the pattern matching operator on the result of a function symbol makes it possible to match against an arbitrary "view" of the data, binding variables out of the view, rather than on the data itself.

Potential Alternatives/Work-arounds

Work-arounds for this feature tend to be very use-specific and involve brittle utilities, code duplication, or both. It becomes necessary to write the logic associated with the view pattern manually, losing out on the benefits of built-in pattern matching in K.

I think it might be possible to emulate the continue-on-fail behavior of a regular pattern match in K today with the #STUCK strategy, by I haven't tried that as it would be far more confusing and hard to understand than the custom logic for pattern matching in my case.

radumereuta commented 2 years ago

@dwightguth you weren't in the meeting, and we could use your input on this. Do we need backend support to implement this feature?

dwightguth commented 2 years ago

Yes, you are going to need backend support in order to implement this. Right now the symbol is implemented by means of creating a new function that performs the matching and returns a boolean, but there is no way to propagate the binding out of the function and back to the original rule with this approach. You will have to actually implement native support for this symbol in the backend if you want this capability. I'm not entirely sure how best it should work if that's what you want to do. A where clause on rules might be your best bet rather than trying to implicitly extract the terms from the rule using the existing syntax. If you go that route, then the decision tree generation algorithm in the llvm backend would have to become aware of these bindings, and the code generator would have to be modified in order to correctly produce those values when the side condition is evaluated. You would also have to figure out how to pass the values from the side condition to the rhs, because currently, side condition functions in the llvm backend simply return a boolean, which is not enough information in this case.

JKTKops commented 2 years ago

By "boolean" I presume you mean an LLVM i1 and not a K Bool?

In that case, does the LLVM backend already have some matching function (generator, possibly) for regular matches that produces a structure with binding information? I'd think it would be possible to use the same method used for regular matches and treat the results the same way a regular side condition is treated in the decision tree. The new needed support would be a way to gather bindings from it. Perhaps this is exactly what you're suggesting, I'm not sure.

Also probably worth mentioning that Everett pointed out to me that you don't quite get where bindings for free with this. If the pattern in the LHS of the where binding doesn't match, the semantics here would be that the case fails to match rather than a crash.

dwightguth commented 2 years ago

I mean, the llvm backend represents the Bool type as an i1, but yes, I do mean a native boolean. You might be right that we could just modify the decision tree generator to be aware of the binding symbol and directly bind the variables during matching. That's probably simpler than what I had in mind, but it is contingent on having an explicit guard syntax on sentences rather than trying to treat them as regular functions in the rhs or side condition.

david-k commented 1 year ago

Here is an example that shows the advantage that using :=K would have over #let.

syntax Prog ::= "convert" String | "result" Int
configuration <Top>
                <k> $PGM:Prog </k>
                <sum> 0 </sum>
                <brg> 1 </brg>
                <another> 2 </another>
                <another1> 2 </another1>
              </Top>

If the binding restriction on :=K is lifted we could a rule like the following:

rule <k> convert STR => result I ... </k>
     <sum> S => S +Int B +Int I </sum>
     <brg> B </brg>
  requires I :=K String2Int(STR)

Note that I is used in multiple cells, but only in the LHS. Also, in the rule that prompted this example, the term that is bound to I is more complex, and binding it to a variable greatly increases readability of the rule.

Trying to rewrite the above rule using #let resulted in the following:

rule <Top>
       <k> convert STR ~> K </k>
       <sum> S:Int </sum> // Read and modified
       <brg> B </brg>     // Only read
       A:AnotherCell
       A1:Another1Cell
     </Top>
     =>
     #let I = String2Int(STR) #in
     <Top>
       <k> result I ~> K </k>
       <sum> S +Int B +Int I </sum>
       <brg> B </brg>
       A
       A1
     </Top>

First, it seems that I have to hoist => to the very top, because #let cannot contain rewrites.
Also, I need to have a single top cell in the rule because otherwise I get the following error: NoSuchElementException: key not found: #cells.
Because I use the top cell, I need to match all child cells. However, explicitly enumerating all cells would be a bit annoying in KEVM.
If I replace A and A1 with a single variable REST, I get the following error: [Error] Compiler: Could not find sorts: [Bag]. Importing the KCELLS module in which Bag seems to be defined results in more kompile errors.

So using #let is quite inconvenient in this case. Everett provided a quite readable alternative using a helper rule:

rule <k> convert STR => convert_2 String2Int(STR) ... </k>

rule <k> convert_2 I => result I ... </k>
     <sum> S => S +Int B +Int I </sum>
     <brg> B </brg>

The downside is that this it is somewhat less "obvious" what is going and increases the cognitive load a bit, but in the meantime this looks like a good solution.

ehildenb commented 1 year ago

For the frontend:

Add a new syntax for where WHERE_CLAUSES (and WHERE_CLAUSES can only be a bunch of _andBool_ of a bunch of _:=K_).
The WHERE_CLAUSES rhs should only contain variables which are already bound by either the LHS of the rule, or a prior clause in the WHERE_CLAUSE.
The frontend needs to sort the WHERE_CLAUSES and check for no-cycles.

For the LLVM backend:

The decision tree generation process needs to be modified in the following way:
- The DT leaf for each rule with a where-clause needs to change the leaf of the DT (where it goes to "apply the rule"), to instead jump to the the code which evaluates the RHS of the where clause, which then should jump to the DT for matching the LHS of the where clause.
- Iteratively chain all the where-clauses like this (if any fail, proceed to next rule).
Only the Scala code should need to be changed (the code which generates the DTs and chains them together). Maybe some small changes needed to instruct the LLVM backend to otherwise discard the where clauses.

For the Haskell backend:

The WHERE_CLAUSES can be compiled to normal equalities that stay on teh LHS of the rule. This will probably not be the most efficient way to do this, but will send terms to bottom that need to go to bottom.

runtimeverification / k