Open aanastasiou opened 1 year ago
Thanks for the discussion. It looks to me like an opportunity for associative matching and associative/commutative matching. In Rascal, we don't have this feature (yet), but usually, we can work around this by making list/set alternatives for those binary operators which are associative, or associative/commutative/idempotent. We don't have bags yet, so we can not model associativity with commutativity but without idempotence.
First, let's extend the data type S
, with a flat list of things to push, and remove the old recursive operators there where they were only required for chaining:
data S
= push(int a, S rest)
| push(int a)
| push(list[S] elements) // a flat list of elements to push in order left-to-right
| add()
| add(S s) // old example
| sub()
| mul()
| div()
| check(int condition)
| resume()
;
Now, we can normalize (nested) pushes:
S push(int a, S rest) = push([push(a), S]); // initial wrapper step
S push([*pre, push([*middle]), *post]) = push([*pre, *middle, *post]); // flattener
S add(S s) = push([add(), S]); // lift continuations to list tails
Likewise, for add
and mult
this is possible to make associative variants, but in this example it seems they are more used as stack operations then as binary operators.
Now simplification rules which require "deep" lookaheads in a push list, are written using list matching. In particular, I believe they allow what you propose as a solution, namely to match a prefix and continue with what is matched:
Here we use simp
to reverse the check
and the resume
as I think you proposed as an example:
S simp(push([*pre, resume(), *middle, check(int condition), *post])) = push([*pre, check(condition), *middle, resume(), *post])
Does this make sense?
I would probable have a top-level data-type:
data P = program(list[S] instructions);
data S
= add() | mul() | ... ;
I have some examples of this kind of design in the simplification of boolean formulas and also of JVM bytecode if you are interested.
To summarize:
Penny for your thoughts?
Likewise grateful for your detailed response.
OK, that's great, if I am reading it correctly, in this particular example we still start with a tree but re-write the tree to an equivalent list of symbols, then continue working on that list of symbols instead of the more challenging (for the moment) original tree. Yes, this definitely makes sense, it is also possibly faster.
Here we use
simp
to reverse thecheck
and theresume
as I think you proposed as an example:
Oh no, that's on me, I did not make it clear and also just noticed that I mixed up the conditional with the iteration.
In any case, I was not trying to just reverse the branches. I was trying to "execute" the IF
by re-writing during simplification. The stack expression add(push(1,push(1)))
is equivalent to the much simpler push(2)
. Similarly, the stack expression resume(sub(push(1,check(push(10)))))
is equivalent to push(9)
. What I wanted to do was to capture the parts of check
(condition, true branch) and resolve the if/then
by re-writing. The equivalent in FORTH is 10 IF 1 - THEN
. 10 is non-zero, which means that the simplification of the above expression is 10 1 -
and eventually 9
.
And that's the conditional done by re-writing.
The iteration (the equivalent of a WHILE/ REPEAT
) is almost identical and was my next step. This is why I wrote "counts down to zero" (which was a mistake on my part). In that case the re-writing maintains the WHILE / REPEAT
until the condition becomes zero.
The fact that I could not untangle the matched parts from a descendant pattern is what led to this issue: Given resume(sub(push(1,check(push(10)))))
, if you try to match resume(t:/check(push(int condition, S rest)))
you get check(push(int condition, S rest)
within t
as well, which means that when you try to do the re-writing, you cannot get rid of the check(.)
part.
Thanks for your help, as this was a feature request, feel free to close it if you don't think it has a reason to stand, as far as I am concerned I was happy to see that there is an alternative.
Thanks @aanastasiou ; I'm leaving it open for now, to see if other people latch on. It's also another motivation for binary operators to get modifiers such as associative
and commutative
and idempotent
. Their internal structure could become a list or a set without the user knowing, and the pattern matching could become similarly powerful..
Is your feature request related to a problem? Please describe.
I was reading up on the simplification example and decided to express a stack language evaluation as simplification.
I understand that even if the feature I am highlighting worked, the "simplification" would stop prematurely. But I am giving here the whole example exactly as it led to the writing of this "issue".
Up to numeric expressions, the simplification works:
Given:
And a set of trivial "simplification" rules such as:
(including the default), the following work as expected:
or
Here the "simplification" proceeds bottom-up working constantly from one end of the treelist.
Now we want to add conditional execution via re-writing. We cannot use
if/then
, so they will be represented ascheck/resume
:Now the "simplification" rule is:
Which is a great opportunity to use the descendant pattern.
The motivation for this issue here is not that this is not going to work as "simplification". The thing to note here is that the descendant pattern returns the "known part"
check(push(int condition, S rest))
int
and makes it impossible to apply the kind of re-writing we are after.The test case for this is the pseudo-forth program
10 IF DUP 1 - THEN 0
which should count-down from 10 and return0 0
(DUP
is also easy to be expressed as simplification:S dup(push(int a, S rest)) = push(a, push(a, S))
). This is expressed aspush(0,resume(sub(push(1,dup(check(push(10)))))))
.I have tried a number of different Rascal patterns to express this, including Node and Typed (even if it is noted as buggy at the moment), but the "difficulty" remains in the sense that if you happen to have overlapping parts in a pattern you cannot isolate them. Consider for example here, having an
if/then/else
(orcheck/otherwise/resume
): We are trying to go through10 IF DUP 1 - ELSE 1 + THEN 0
orpush(0,resume(add(push(1,otherwise(sub(push(1,dup(if(push(10))))))))))
.You start capturing at the "root"
resume
for an arbitrary length untilotherwise
(calling itf
for the False branch) and then from insideotherwise
for an arbitrary length all the way tocheck
(calling itt
for the True branch) . But,t
is contained inf
and this mangles up the re-write.Describe the solution you'd like
This may be a suggested solution from a very limited perspective: Make the descendant pattern match from (but not including) the root (as it does now) all the way to the
/somePattern
but not including thesomePattern
because if it is required in the right hand side of a given re-write rule then it is known and it does not add anything.Admittedly however, I don't think I have explored enough what fresh problems this idea creates.
Describe alternatives you've considered All possible pattern alternatives, when used within a descendant pattern still capture overlapping parts.
Sticking with the most simple example of
check/resume
, I tried to reverse the tree. This results in a less intuitive structure, because while (for example)add(push(1,push(1)))
is a very natural application ofadd
on the state of the stack, after reversal it becomespush(1, push(1, add()))
...And the rules are ugly because everything needs "terminal" position as well (so,add(S)|add()
and so on). This now looks much more straightforwardcheck(t:/resume(S rest)) = ...
, but suffers from the same problem,rest
is contained int
.Additional context Add any other context or screenshots about the feature request here.
A more complete listing is available here, if it helps.