sipa / gramtropy

Grammar-based passphrase generator
GNU General Public License v3.0
86 stars 24 forks source link

wrong entropy for ambiguous grammars #1

Open lucash-dev opened 6 years ago

lucash-dev commented 6 years ago

The entropy estimation algorithm assumes a grammar is unambiguous, that is, there's only one way to produce a string.

For example the following grammar, that produces strings of bits, results in a length of 51 for an entropy of 64 bits:

main = ( "1" | "0" | "10")+;

While the unambiguous version results in the length 64, as expected:

main = ("1" | "0")+;

There are myriad other ways that an ambiguous grammar can be created on purpose or by accident.

There's likely no good way of solving this, since, in the general case, the only way of testing a given CFG is unambiguous is brute force AFAIK. So, perhaps all that can be done is updating the documentation with the info that it's only safe to use grammars that are provably unambiguous.

lucash-dev commented 6 years ago

Oh, I see there's a reference to unambiguous in the README I missed before. Sorry about that. Still, maybe it's worth adding a doc clarifying that the ambiguous grammar will be accepted by the compiler, but produce wrong results.

sipa commented 4 years ago

In some cases ambiguous grammars will be rejected, actually, but indeed - it's not guaranteed.