When using a descendant pattern, don't return the specified part of the pattern upon a match

aanastasiou commented 1 year ago

Is your feature request related to a problem? Please describe.

I was reading up on the simplification example and decided to express a stack language evaluation as simplification.

I understand that even if the feature I am highlighting worked, the "simplification" would stop prematurely. But I am giving here the whole example exactly as it led to the writing of this "issue".

Up to numeric expressions, the simplification works:

Given:

data S = push(int a, S rest)|
                push(int a)|
               add(S)|
               sub(S)|
               mul(S)|
               div(S)

And a set of trivial "simplification" rules such as:

S simp(add(push(int a, push(int b, S rest)))) = push(a+b, rest);
S simp(add(push(int a, push(int b)))) = push(a+b);
...
...
...

(including the default), the following work as expected:

simplify(add(push(1, push(1))));  // returns push(2) as the simplified expression

or

simplify(add(push(1,mul(push(2,push(3)))))); // returns push(7)

Here the "simplification" proceeds bottom-up working constantly from one end of the treelist.

Now we want to add conditional execution via re-writing. We cannot use if/then, so they will be represented as check/resume:

data S = push(int a, S rest)|
                push(int a)|
               add(S)|
               sub(S)|
               mul(S)|
               div(S) |
               check(S)|
               resume(S)

Now the "simplification" rule is:

S simp(resume(t:/check(push(int condition, S rest)))) = condition!=0?resume(t,check(t,push(condition, rest))):push(condition, rest);
S simp(resume(t:/check(push(int condition)))) = condition!=0?resume(t,check(t,push(condition))):push(condition);

Which is a great opportunity to use the descendant pattern.

The motivation for this issue here is not that this is not going to work as "simplification". The thing to note here is that the descendant pattern returns the "known part" check(push(int condition, S rest)) in t and makes it impossible to apply the kind of re-writing we are after.

The test case for this is the pseudo-forth program 10 IF DUP 1 - THEN 0 which should count-down from 10 and return 0 0 (DUP is also easy to be expressed as simplification: S dup(push(int a, S rest)) = push(a, push(a, S)) ). This is expressed as push(0,resume(sub(push(1,dup(check(push(10))))))).

I have tried a number of different Rascal patterns to express this, including Node and Typed (even if it is noted as buggy at the moment), but the "difficulty" remains in the sense that if you happen to have overlapping parts in a pattern you cannot isolate them. Consider for example here, having an if/then/else (or check/otherwise/resume): We are trying to go through 10 IF DUP 1 - ELSE 1 + THEN 0 or push(0,resume(add(push(1,otherwise(sub(push(1,dup(if(push(10)))))))))).

You start capturing at the "root" resume for an arbitrary length until otherwise (calling it f for the False branch) and then from inside otherwise for an arbitrary length all the way to check (calling it t for the True branch) . But, t is contained in f and this mangles up the re-write.

Describe the solution you'd like

This may be a suggested solution from a very limited perspective: Make the descendant pattern match from (but not including) the root (as it does now) all the way to the /somePattern but not including the somePattern because if it is required in the right hand side of a given re-write rule then it is known and it does not add anything.

Admittedly however, I don't think I have explored enough what fresh problems this idea creates.

Describe alternatives you've considered All possible pattern alternatives, when used within a descendant pattern still capture overlapping parts.

Sticking with the most simple example of check/resume, I tried to reverse the tree. This results in a less intuitive structure, because while (for example) add(push(1,push(1))) is a very natural application of add on the state of the stack, after reversal it becomes push(1, push(1, add()))...And the rules are ugly because everything needs "terminal" position as well (so, add(S)|add() and so on). This now looks much more straightforward check(t:/resume(S rest)) = ..., but suffers from the same problem, rest is contained in t.

Additional context Add any other context or screenshots about the feature request here.

A more complete listing is available here, if it helps.

jurgenvinju commented 1 year ago

Thanks for the discussion. It looks to me like an opportunity for associative matching and associative/commutative matching. In Rascal, we don't have this feature (yet), but usually, we can work around this by making list/set alternatives for those binary operators which are associative, or associative/commutative/idempotent. We don't have bags yet, so we can not model associativity with commutativity but without idempotence.

First, let's extend the data type S, with a flat list of things to push, and remove the old recursive operators there where they were only required for chaining:

data S 
   = push(int a, S rest)
   |  push(int a)
   | push(list[S] elements) // a flat list of elements to push in order left-to-right
   | add() 
   | add(S s) // old example 
   | sub()
   | mul()
   | div()
   | check(int condition) 
   | resume()
   ;

Now, we can normalize (nested) pushes:

S push(int a, S rest) = push([push(a), S]);                                                 // initial wrapper step
S push([*pre, push([*middle]), *post]) = push([*pre, *middle, *post]); // flattener
S add(S s) = push([add(), S]);                                                                    // lift continuations to list tails

Likewise, for add and mult this is possible to make associative variants, but in this example it seems they are more used as stack operations then as binary operators.

Now simplification rules which require "deep" lookaheads in a push list, are written using list matching. In particular, I believe they allow what you propose as a solution, namely to match a prefix and continue with what is matched:

Here we use simp to reverse the check and the resume as I think you proposed as an example:

S simp(push([*pre, resume(), *middle, check(int condition), *post])) = push([*pre, check(condition), *middle, resume(), *post])

Does this make sense?

I would probable have a top-level data-type:

data P = program(list[S] instructions);
data S 
   = add() | mul() | ... ;

I have some examples of this kind of design in the simplification of boolean formulas and also of JVM bytecode if you are interested.

To summarize:

the deep match abstracts from vertical structure, but perhaps it abstracts a little too much
other ways to abstract from vertical structure are associativity (A) and associative/commutative (AC) matching
in Rascal we use lists and list matching to model associative matching

Penny for your thoughts?

aanastasiou commented 1 year ago

Likewise grateful for your detailed response.

OK, that's great, if I am reading it correctly, in this particular example we still start with a tree but re-write the tree to an equivalent list of symbols, then continue working on that list of symbols instead of the more challenging (for the moment) original tree. Yes, this definitely makes sense, it is also possibly faster.

Here we use simp to reverse the check and the resume as I think you proposed as an example:

Oh no, that's on me, I did not make it clear and also just noticed that I mixed up the conditional with the iteration.

In any case, I was not trying to just reverse the branches. I was trying to "execute" the IF by re-writing during simplification. The stack expression add(push(1,push(1))) is equivalent to the much simpler push(2). Similarly, the stack expression resume(sub(push(1,check(push(10))))) is equivalent to push(9). What I wanted to do was to capture the parts of check (condition, true branch) and resolve the if/then by re-writing. The equivalent in FORTH is 10 IF 1 - THEN. 10 is non-zero, which means that the simplification of the above expression is 10 1 - and eventually 9. And that's the conditional done by re-writing.

The iteration (the equivalent of a WHILE/ REPEAT) is almost identical and was my next step. This is why I wrote "counts down to zero" (which was a mistake on my part). In that case the re-writing maintains the WHILE / REPEAT until the condition becomes zero.

The fact that I could not untangle the matched parts from a descendant pattern is what led to this issue: Given resume(sub(push(1,check(push(10))))), if you try to match resume(t:/check(push(int condition, S rest))) you get check(push(int condition, S rest) within t as well, which means that when you try to do the re-writing, you cannot get rid of the check(.) part.

Thanks for your help, as this was a feature request, feel free to close it if you don't think it has a reason to stand, as far as I am concerned I was happy to see that there is an alternative.

jurgenvinju commented 1 year ago

Thanks @aanastasiou ; I'm leaving it open for now, to see if other people latch on. It's also another motivation for binary operators to get modifiers such as associative and commutative and idempotent. Their internal structure could become a list or a set without the user knowing, and the pattern matching could become similarly powerful..

usethesource / rascal

When using a descendant pattern, don't return the specified part of the pattern upon a match #1778