Open lucash-dev opened 6 years ago
Oh, I see there's a reference to unambiguous in the README I missed before. Sorry about that. Still, maybe it's worth adding a doc clarifying that the ambiguous grammar will be accepted by the compiler, but produce wrong results.
In some cases ambiguous grammars will be rejected, actually, but indeed - it's not guaranteed.
The entropy estimation algorithm assumes a grammar is unambiguous, that is, there's only one way to produce a string.
For example the following grammar, that produces strings of bits, results in a length of 51 for an entropy of 64 bits:
While the unambiguous version results in the length 64, as expected:
There are myriad other ways that an ambiguous grammar can be created on purpose or by accident.
There's likely no good way of solving this, since, in the general case, the only way of testing a given CFG is unambiguous is brute force AFAIK. So, perhaps all that can be done is updating the documentation with the info that it's only safe to use grammars that are provably unambiguous.