microsoft / Trieste

A term rewriting system for experimental programming language development.
MIT License
37 stars 20 forks source link

The `++` pattern is too greedy? #89

Open EliasC opened 9 months ago

EliasC commented 9 months ago

I was expecting the pattern Any++ * T(Bar) to match the sequence Foo Foo Foo Bar, just as the regular expression .*b matches aaab, but it seems like the Any++ matches all of Foo Foo Foo Bar, meaning the whole pattern fails to match (since there is no Bar after that sequence). I don't know if this is by design, nor what the consequences would be of making it more like the kleene star, but I thought I would bring it up for discussion.

My use case was for matching a Y inside an X, where the last child of Y is a Z (for reasonable values of X, Y, and Z):

In(X) * T(Y) << (Any++ * T(Z) * End)

If I didn't care about both the X and the Y I could just have T(Z) * End, but I now it's important that the Y is in an X and has the children specified. In this particular case I can do (!T(Z))++ * T(Z), but in general this might not be satisfactory.

mjp41 commented 9 months ago

So I would write that as

In(X) * T(Y) << ((!T(Z))++ * T(Z) * End)

The current implementation does not backtrack the | or ++ matching based on the continuation. For some of the optimisations I applied, I actually carry the continuation around, so that could be possible to do.

There is another similar case:

((T(X) * T(Y)) | (T(X)) * T(Z)

This matches XYZ or XZ. But

((T(X) | (T(X) * T(Y))) * T(Z)

only matches XZ.

EliasC commented 9 months ago

Are these optimisations of the Trieste code, or optimisations written in the client code?

mjp41 commented 9 months ago

Are these optimisations of the Trieste code, or optimisations written in the client code?

Trieste code optimisations. Before a pattern like

T(A) * T(B) * T(C)

would result in for dynamic dispatch calls before it tries to see if there is a T(A). By CPS converting the pattern it becomes one, and the continuation is a dynamic dispatch as well.

E.g. here we set_continuation https://github.com/microsoft/Trieste/blob/0b2ec3c5ea7bf52b0f6b2ed9e4f9d571aba47a8b/include/trieste/rewrite.h#L957

This could form the basis for the better backtracking, but would need care given the other optimisations.