oridb / mc

Myrddin Compiler
MIT License
387 stars 34 forks source link

Add a new match compiler implementation #193

Closed typeless closed 4 years ago

typeless commented 4 years ago

Fixes #183

References: "When Do Match-Compilation Heuristics Matter?" by Kevin Scott and Norman Ramsey

Stats: Sample count: 506 Dtree Size avg: 5.38 95th percentile: 3.00 maximum: 100 Dtree Height avg: 1.39 95th percentile: 1.00 maximum: 12

Sample generation: $ MATCH_STATS=1 make bootstrap && mbld -R support/matchstats.myr ./match.csv

typeless commented 4 years ago

@oridb no problem. Just take your time. 😉

typeless commented 4 years ago

TODO:

  1. rename _XXX variables/functions/macros in match_test.c
  2. figure out how to implement the exhaustiveness checking.
oridb commented 4 years ago

It looks like the existing verifymatch() code will work for exhaustiveness checking as long as you modify the code to record up the number of constructors the type of a decision tree node can have in dt->nconstructors . The algorithm we have is dead simple -- a match is exhaustive if all constructors have a decision edge or there's a wild card.

We can just as easily record the type, and then call nconstructors() on that type.

There are algorithms that provide an example of a value that would not be matched in any of the cases -- that would be nice to have, but we don't need to be that fancy.

typeless commented 4 years ago

@oridb It does not seem to be easy as I thought because the code separates the pattern matrix construction phase from the dtree construction phase. It is a bit different than the original algorithm in the paper which produces the dtree while traversing the pattern tree. I decided to have two phases since it does not seem to be trivial to model our patterns (in particular tuples and friends) after the ‘c(p1,p2,...pn)’ form cleanly and it would simplify the following steps.

I am looking at “ A Term Pattern-Match Compiler Inspired by Finite Automata Theory”. It has some hints about the problem. I am going to experiment the approach. (And it reminds me that I have to expand the wildcard patterns. That is another todo)

I’ll look into it tomorrow (Taiwan time).

oridb commented 4 years ago

There are two different errors that the current pattern match compiler provides. I think the first -- hide checking (don't know if there's a better term) will not transfer, but the second (exhaustiveness checking) will.

The hide checking currently works by simply checking as the tree is built whether adding a pattern to the decision tree actually changes the state of the decision tree under construction. If there's no change made to the tree when a new pattern is wired in, then clearly the pattern was already covered by the previous matches added in. This part will need to be rethought.

The exhaustiveness check, though, is run on a fully constructed dtree, and does not need any cooperation from the intermediate steps. I think we can just use it as is.

typeless commented 4 years ago

@oridb You make a good point that verifymatch can work with the new match compiler. I gave it a second try and it is actually feasible. The key is that we can retrieve the type information of the patterns from the load field indirectly, which I thought it can only be added when building the intermediate data structure. The match compiler has veritymatch now.

typeless commented 4 years ago

What is the recommended style for naming C macros in the codebase? like https://github.com/oridb/mc/blob/3704a945087ebf308f457a56753f30e0c07a4b3b/mi/match_test.c#L398 I am thinking about INIT_T, INIT_P0 and so forth.

oridb commented 4 years ago

T, P0, etc are fine. I'm not really namespacing anything here. I use a slightly unconventional style, so if you wanted init in there, it'd be InitT, InitP0,

If a longer name makes things significantly more readable, that's fine too.

oridb commented 4 years ago

(oops, didn't mean to close this, wrong button)

typeless commented 4 years ago

The hide-checking has been implemented.

typeless commented 4 years ago

By the way, I don't intend to keep the commits after the PR is merged. So, feel free to squash them when merging.

typeless commented 4 years ago

I am working on implementing some heuristics, which I plan to submit in another PR.

typeless commented 4 years ago

I squash all commits into one. That makes the history cleaner. You would need to rewrite the history though.

typeless commented 4 years ago

Fixed a bug in the parsing code for the stats file.