Change wordlist container from qvector to qset

keepassxreboot / keepassxc

KeePassXC is a cross-platform community-driven port of the Windows application “Keepass Password Safe”.

https://keepassxc.org/

Other

20.01k stars 1.42k forks source link

Change wordlist container from qvector to qset #10995

Open christianwengert opened 4 days ago

christianwengert commented 4 days ago

Currently it is possible to use wordlists with duplicate entries. This results in lower entropy for the generated passphrases, because less unique words exist.

This solution is very simple:

Replace the QVector representing currently the wordlist with a QSet. This removes all duplicate entries in a given wordlist.

As a result, this avoids that a malicious wordlist that has the proper length (>4000 entries) but with repetitions (effectively << 4000 entries) can be used and potentially create weaker passphrases than estimated.

Example: List with 4000 items but only 64 unique words would lead to only 48 bit of Entropy instead of ~95 bit!

Testing strategy

There is a new test which loads a wordlist with duplicate entries and will detect improper use. Had to add a bad wordlist to the test data

Type of change

✅ New feature (change that adds functionality)

droidmonkey commented 4 days ago

Can you include a much smaller test wordlist?

christianwengert commented 17 hours ago

hello @droidmonkey . Thanks for your feedback. I was using a wordlist of more than 4000 items in order because this is the minimum size enforced inside KeePassXC. But you are right, it is not absolutely required for the test itself. I shortened the file to 100 entries, does this work for you?

droidmonkey commented 15 hours ago

Actually yes and no. While looking at the code for passphrase generator I noticed we do in fact print a warning when the wordlist is less than 4000 lines, but isValid checks for a wordlist size of more than 1000 lines. So this needs to be rectified. Additionally we should not assert if the passphrase generator isn't valid, that is silly.

In short, to properly validate your code change you would need to provide a valid wordlist of 999 words then repeat one of the words.

christianwengert commented 12 hours ago

This makes sense. Shouldn't we also adapt the assert to 4000 items? I find 4000 better than 1000, but foremost I'd rather have the same limits for both.

edit: I can add this to my PR if you want

droidmonkey commented 12 hours ago

Yes, the minimum should be 4,000 distinct words.

christianwengert commented 12 hours ago

As the bad wordlist I now used the 3999 first entries of the eff_large and duplicated the last entry to reach 4000 items. I also set the limit to 4000 everywhere (and made a #define for it, so we dont have magic numbers in the code)

droidmonkey commented 9 hours ago

Agreed, good idea. Can make it a static member variable that can be manipulated by the testing code. Make the test a friend class for example.

christianwengert commented 9 hours ago

Making the test a friend class would lead to dependencies of the KeePassXC source code of the testing code:

class PassphraseGenerator {
public:
    ...

    friend class TestPassphraseGenerator;
private:
    int m_minimum_wordlist_length = 4000;
}

Wouldn't it be preferable to rather use a setter/getter for this?

michaelk83 commented 8 hours ago

This sort of stuff is usually done with mocking - https://stackoverflow.com/a/23473057

christianwengert commented 8 hours ago

I added it with the friend method. However, I would still advocate to use the real setWordList method, otherwise we are not at all testing the real code if we mock everything. Unless I am missing something

michaelk83 commented 8 hours ago

I don't usually work with c++, so I don't know the limitations there. But depending on the framework, you can make what's called a partial mock. Where you override only the methods that you need to, and the rest is delegated to the original class. So it behaves like the original, except for what you modified for the test.

Making the real code depend on test code is not a good practice, so a friend class is not the best solution here. If you don't want to use a mocking framework just for this, you could manually derive a subclass, maybe. It's a question of how much complexity you're willing to accept for this. But a proper mocking framework could be useful for other tests too.

That's why I phrased it as a question, "Would it be possible ...?"

droidmonkey commented 6 hours ago

Mocking is not correct, that would be for an interface like a hardware key.

Friend class doesn't create a dependency it's just a compiler flag that overrides normal private namespace restrictions. Once code is compiled there is no such thing as public/protected/private.

You don't want a setter/getter because this is not a property you want to modify outside of testing.