Poor performance on flow analysis with very large grammars

Initial profiling suggests we're doing an enormous quantity of calls to compareFlowVertex. I know the algorithm isn't the best at the moment, for both constants and asymptotic complexity.

The most important optimization to do is probably some means of "hash consing" vertexes. Right now we have to do expensive comparisons in compareFlowVertex when it should be a simple integer value or something like that. This would be a major, major constant factor improvement. (Probably a good factor of 100 or so.)

Complexity improvement TBD.

I'm still investigating. Just taking notes in an issue here.

The current grammar exhibiting the awful performance is the ablec feature/closure branch. 52 minutes. Ouch.

Profiling revealed two majors things:

Most time was spent in repairClosure
Most time was spent in calls to compareFlowVertex

I've managed a pretty significant speed-up by changing the implementation of repairClosure to avoid unnecessary calls to its comparison function.

There are still two major performance problems:

Comparison functions shouldn't be so slow.
Doing this purely-functionally hurts a lot. Imperative code would be a lot faster.

One possibility is that we could write a hackyCompare function that is able to compare any two values, and use that in our graphs instead of compareFlowVertex. This would probably be the simplest improvement to comparison performance, despite its ugliness.

Another possibility would be to do something like so:

type MemoizationContext<a> foreign;

function newMemo
MemoizationContext<a> ::= compare::(Integer ::= a a)

function memoize
Integer ::= value::a  context::MemoizationContext<a>

function lookupMemo
a ::= Integer  context::MemoizationContext<a>

Then we could instead implement things like Graph purely on integers, and use something like this to handle the conversions to/from.

An API like this would be quite cheatingly stateful, like genInt. This is possible primarily because the only way a call to lookup should ever happen is with a value that came from memoize, so the weak ordering guarantee the API needs works out. It'd be wildly non-deterministic to try to find all values in a memo context, or to try to query specific integer values, though. But we don't really need those capabilities...

melt-umn / silver

Poor performance on flow analysis with very large grammars #79