redacted / XKCD-password-generator

Generate secure multiword passwords/passphrases, inspired by XKCD
BSD 3-Clause "New" or "Revised" License
1.32k stars 185 forks source link

Entropy calculation with acrostic is wrong #127

Open ilyagr opened 3 years ago

ilyagr commented 3 years ago

If there are 382 words that start with 'a', then the entropy of the acrostic 'aaaaa' should be ln_2(382)*5. However, this is not what xkcdpass reports:

$  xkcdpass -V --acrostic 'a'
With the current options, your word list contains 382 words.
A 1 word password from this list will have roughly 8 (8.58 * 1) bits of entropy,
assuming truly random word selection.

anyone

$  xkcdpass -V --acrostic 'aaaaa'
With the current options, your word list contains 1910 words.
A 5 word password from this list will have roughly 54 (10.90 * 5) bits of entropy,
assuming truly random word selection.

amiable activism arise aspire ageless

(The correct answer would be 8.58 * 5 in the second example)

I believe the problem is here: https://github.com/redacted/XKCD-password-generator/blob/33c27be0879bc749fe16b39bf091a5f2cc2ce5dd/xkcdpass/xkcd_password.py#L163-L168

Currently, it computes the entropy as num_words * ln_2(sum of lengths of word lists for each letter in the acrostic).

The correct formula for the entropy is ln_2(# of words for letter 1) + ln_2(#of words for letter 2) + .... Separating length and num_words doesn't make sense in this setting.

ilyagr commented 3 years ago

So, it should be something like:

 if options.acrostic: 
     worddict = wordlist_to_worddict(wordlist) 
     entropy = 0.0 
     for char in options.acrostic: 
         if char not in worddict:
            # Less confusing error message than 'math domain error'
            raise ValueError('No words in list start with letter `{}`.'.format(char))
         entropy += math.log(len(worddict[char]), 2)
     print('The entropy with the acrostic is approximately', int(entropy), 'bits.')
  else:
    # Rest of the function