need `memo` strategy modifier for `visit` to avoid exponential running times.

jurgenvinju commented 1 week ago

Is your feature request related to a problem? Please describe.

If I write a recursive function over Tree, with case distinction over appl, amb, char and cycle then I must use @memo to avoid exponential behavior in the presence of nested ambiguity. I.e. for syntax E = E + E, an input like 1 + 1 + 1 + 1 + 1 uses time with a factor of $2^3$ without @memo, and in general it is in $O(2^{n - 1})$ with $n$ the number of operators. With @memo we can remain in $O(n^3)$; a dramatical improvement.

However, using a visit saves more than half of the code for these kinds of functions. The recursion is very useful and very repetitive, exactly what visit is for. Since visit does not have an @memo modifier, it will behave exponentially.

Describe the solution you'd like

Extending the Strategy modifier for visit with an option to memo every currently existing strategy. For example:

memo bottom-up visit(x) {
...
}

Describe alternatives you've considered

Could also use tags instead:

@memo bottom-up visit(x) {
...
}

But that would be the first expression/statement kind that may have tags and I don't see a general applicability there yet,

Additional context

After this, it becomes weird that functions are tagged with @memo while visit is modified with memo. Since memo is a builtin function modifier we may choose to upgrade it to a modifier from a tag, just like java and public.

@memo has configuration bells and whistles for cache eviction policies; if we consider adding memo to visit we have to consider mapping these policies as well. Perhaps it is unnecessary, but it's best to remain consistent and predictable.

jurgenvinju commented 3 days ago

@PaulKlint @tvdstorm @DavyLandman an alternative fix which is much much simpler:

Let all the strategies of the current visit semantics all memoize on visiting each individual (reference to) amb cluster, and nowhere else:

prevents the combinatorial explosion always caused by nested amb clusters
uses deep internal knowledge about reference equality of nested clusters (so unreachable for Rascal users anyway), but with optimal low overhead because of this.
breaks algorithms that need to visit amb clusters again and again for side-effects or something. However, if you want that you can write your own recursion over Tree and do whatever you need.
out of the box no more exponential running times for accidental amb traversals.
we still visit each individual amb cluster at least once (if not broken out of the visit), just not again and again and again.

This is implementable almost all in the run-time code for tree traversal. No complex changes to be made. WDYT?

It would allow a number of the error recovery algorithms to be written easily in short Rascal code, instead of complex Java code with if-then-elses or the BottomUpTreeVisitor or whatever we have in Java. This is what Rascal was made for after all. I don't want to introduce all this Java code now for error recovery heuristic algorithms, only because of the memoization aspect.

jurgenvinju commented 3 days ago

The location in the interpreter is the traverseOnce method of TraversalOperator. For all strategies:

detect amb nodes
store output of traversal in hash map
prevent recursion if output already in hash map
return stored output from hash map if present

This will be slightly different for every traversal strategy, because this is a top-down memo strategy that will also be applied to bottom-up even though that seemingly contradicts the strategy. Bottom-up should not go down into an already visited cluster either.

DavyLandman commented 3 days ago

I like the idea, my only concern if it would surprise people that use a visit actions with side-effects (some kind of counter for example). I agree for parse trees, it would be harder to hit cases where the same tree is not the same thing, but how about different data types? Like some kind of AST?

Currently we also have configurations for memo, since we found that a user might want to tune the caching, to reduce the memory pressure. Or do we only want to have memo in the scope of the current visit?

DavyLandman commented 3 days ago

And, although I originally added the @memo feature, I've since come to dislike the name. As it's not a full memoization, it's more of an cache/lru that when you're lucky (have enough memory) it can behave like memoization. So if we go for some kind of keyword, maybe we should reconsider the memo term.

jurgenvinju commented 3 days ago

Or do we only want to have memo in the scope of the current visit?

Yes, definitely only for the scope of the visit if we are talking about visit strategies.

I like the idea, my only concern if it would surprise people that use a visit actions with side-effects (some kind of counter for example).

The point is that such algorithms would always be highly exponential if they do not memoize. So people won't get an answer for that count anyway within a day or so. Let's say something like 1 + 1 + 1 + 1 + 1 + 1 + 1 for not disambiguated expression language would already take a visit hours to complete.

So I claim those people do not exist, otherwise we would have had them in our inboxes. If they have allowAmbiguity=false on, then this code will never be triggered because we only memoize the amb clusters and not the intermediate appl nodes. And on top of that most times we call the parsers we annotate parse trees with their source locations, which makes them both reference-unique and value-unique, so a visit would never meet the same tree twice (unless it is doing the innermost or outermost strategy).

I agree for parse trees, it would be harder to hit cases where the same tree is not the same thing, but how about different data types? Like some kind of AST?

For parse tree amb nodes we can use reference equality. If we want to memoize just like functions do on other data-types, all we have is hashCode and equals, otherwise it breaks the notion of value semantics of Rascal. I still believe that this would be a useful feature for visit to have, including the configuration options of memo functions, but this is a much bigger intervention in the language design. We could post-pone that if we only deal with reference amb clusters now, implicitly.

jurgenvinju commented 3 days ago

It's very important that we only memoize amb clusters and not all the appl nodes in between. The reason is that only amb clusters are reference-shared in reality and almost never appl nodes. When apps nods just come out of the parser they are unique by their .src origin, their non-terminal (production rule even) and their constituent children by induction.

After rewriting and matching/substitution of course it can become a feast of reference sharing inside of a tree by duplication, but also if trees become accidentally equal by reduction. In this case memoization makes less and less sence. For starters we do not have a fast enough implementation of equality modulo the src origin fields; and not ignoring the arbitrary locations of parse trees makes little sense semantically. So there are a dozen questions to answer about what memo should do in a visit that have to do with the semantics of structural equality. I think I'd rather not think about that now.

DavyLandman commented 3 days ago

Ok, I was confused by reading the original proposal and then the subsequent one. I missed the important detail that the second proposal was limited to amb nodes. Sorry for the confusion. Most of my comments are caused by this confusion.

jurgenvinju commented 3 days ago

Ah right. sorry; I missed that too. It is a subtle detail indeed, especially if we also talk about reference equality rather than structural equality for the memo map.

usethesource / rascal

need `memo` strategy modifier for `visit` to avoid exponential running times. #2076