microsoft / Trieste

A term rewriting system for experimental programming language development.
MIT License
38 stars 20 forks source link

The `++` pattern is too greedy? #89

Open EliasC opened 11 months ago

EliasC commented 11 months ago

I was expecting the pattern Any++ * T(Bar) to match the sequence Foo Foo Foo Bar, just as the regular expression .*b matches aaab, but it seems like the Any++ matches all of Foo Foo Foo Bar, meaning the whole pattern fails to match (since there is no Bar after that sequence). I don't know if this is by design, nor what the consequences would be of making it more like the kleene star, but I thought I would bring it up for discussion.

My use case was for matching a Y inside an X, where the last child of Y is a Z (for reasonable values of X, Y, and Z):

In(X) * T(Y) << (Any++ * T(Z) * End)

If I didn't care about both the X and the Y I could just have T(Z) * End, but I now it's important that the Y is in an X and has the children specified. In this particular case I can do (!T(Z))++ * T(Z), but in general this might not be satisfactory.

mjp41 commented 11 months ago

So I would write that as

In(X) * T(Y) << ((!T(Z))++ * T(Z) * End)

The current implementation does not backtrack the | or ++ matching based on the continuation. For some of the optimisations I applied, I actually carry the continuation around, so that could be possible to do.

There is another similar case:

((T(X) * T(Y)) | (T(X)) * T(Z)

This matches XYZ or XZ. But

((T(X) | (T(X) * T(Y))) * T(Z)

only matches XZ.

EliasC commented 11 months ago

Are these optimisations of the Trieste code, or optimisations written in the client code?

mjp41 commented 11 months ago

Are these optimisations of the Trieste code, or optimisations written in the client code?

Trieste code optimisations. Before a pattern like

T(A) * T(B) * T(C)

would result in for dynamic dispatch calls before it tries to see if there is a T(A). By CPS converting the pattern it becomes one, and the continuation is a dynamic dispatch as well.

E.g. here we set_continuation https://github.com/microsoft/Trieste/blob/0b2ec3c5ea7bf52b0f6b2ed9e4f9d571aba47a8b/include/trieste/rewrite.h#L957

This could form the basis for the better backtracking, but would need care given the other optimisations.