softdevteam / lrpar

Rust LR parser
Other
1 stars 0 forks source link

Simplify and improve the distance function #47

Closed ltratt closed 6 years ago

ltratt commented 6 years ago

The original distance calculation used a fiddly mechanism to calculate goto_edges. This version simplifies that calculation, instead producing goto_states. This highlighted a surprising (to me...) flaw in the original calculation: it underunderapproximates the distance.

The new distance algorithm is able, in a few cases, to produce a better underapproximation of the true distance (i.e. it occasionally produces higher values). You can see that in the "tricky" test case that's also part of this PR: from the first to the second commit, the new distance calculation is able to change one distance of 0 into 1.

For example on the Java grammar here is the change in the distribution of distances:

     old      new
  0  22,091   18,885
  1  46,646   42,762
  2  24,614   28,745
  3   6,075    8,575
  4   1,482    1,936
  5     270    322
  6      10    26

In both cases, there are 1297 infinite distances.

As can be fairly clearly seen, this means that the average distance in this new calculation is higher. Although this leads to roughly 10% fewer states being explored in the search, it doesn't seem to make a huge difference to performance. That's something to look into next.

ptersilie commented 6 years ago

Merged!