Memory consumption and performance

allquixotic commented 5 years ago

We are using kwprocessor to generate key walk password combinations to blacklist. One of the higher level “routes” generates all QWERTY keyboard password combinations from 2 to 16 characters with up to four “direction changes” while walking. This creates a 187 MB password database.

This 187 MB file ends up consuming about 3.5 GB when loaded into PassFiltEx, or 18 times the raw file size. But for all that memory usage, it doesn’t appear to vastly improve search performance when doing a password reset. An allowable password takes on the order of 60-90 seconds to validate; blacklisted passwords take a bit less.

The problem is, they take so long to validate that threads in lsass and other processes start to get blocked and queues start filling up because PassFiltEx takes so long. This is not a great experience for a middling password database considering it uses so much memory.

The priority for me would be to improve password validation time. After that, memory consumption, but we can deal with 3.5GB heap in lsass if it’s performant.

For now we are using a much smaller and simpler key walk file using a simpler “route” algorithm, but it isn’t as comprehensive as we’d like. Also, if we decide to append other types of password blacklists to our key walk database (the smaller one is only 3.5 MB), the file size could grow again and the performance would possibly be poor again.

What exactly is the algorithm used that takes up so much RAM while apparently still requiring a linear search? Any ideas for a suitable improvement without completely changing the design goals or scope of PassFiltEx?

_{Sent with GitHawk}

ForumSchlampe commented 5 years ago

Sorry to overtake this, we had a similar issue with "openpasswordfilter" which is by design a bit different, anyway we did a fork of openpasswordfilter

Our workarounds for this situation are

fixing some memory issues, still the memory consumption is HUGE but it is fast after initial loading
put the matching list in a database (mysql/mssql) -> fast startup and fast response
use external password checking service like HIBP (idea was already implemented in openpasswordiflter but in our fork it is easy to enable/disable as you like it) -> fast startup and fast response Feel free to take a look https://github.com/ForumSchlampe/OpenPasswordFilter/tree/DBFunktionen

So no, we have no other idea than changing the design of passfiltex

allquixotic commented 5 years ago

Thanks for the feedback ForumSchlampe. Before we started using PassFiltEx, we looked at OpenPasswordFilter and its many forks, including yours.

My main concerns with OpenPasswordFilter:

Making a network connection within LSASS (to anything, really) seems like it's making the system more vulnerable than not using a password filter at all. Named pipes might be a bit better. Some sort of compiled-in TLS certificate might improve it even more, but that's a lot of work.
OpenPasswordFilter's error checking is absolutely atrocious. The code is not well-designed, and there are many edge cases where it is using bad practices that are fragile.
As we scanned the C++ of OpenPasswordFilter, the "WTF density" of reading the code was extremely high. This is a huge red flag and we are not enthusiastic about shipping a product like this into our production AD cluster.
There are too many decentralized half-baked forks of OpenPasswordFilter that each improve it in little ways. It doesn't appear that anyone has taken up the mantle of trying to properly and thoroughly fix / rewrite OPF to correctly check for errors.
The ability to turn on ETW tracing in PassFiltEx makes it observable in case of failure, which is something OPF doesn't have.
In the end, we didn't end up needing anything like a database.

We're not presently targeting HIBP scanning as part of our blacklist solution. If we needed it in the future, we may look into adding it into PFEx instead of OPF because the core codebase is so much better designed. Here are some of my concerns, based mainly on my experience with upstream OPF but also some conceptual concerns based on your database enhancement (you may have addressed some of these; I didn't look at your code all that closely):

The .gitignore file in jephthai's repo is a random HTTP response from GitHub. It's not an actual .gitignore list. WTF?
Error conditions are commented out with no actual handling/logging of the errors.
Passwords are sent in plain text over a TCP socket. This is a huge security no-no.
The password is not zero-filled after finishing with it.
The repo contains many unnecessary files that should not be in version control (auto-generated files and such).
The C# service adds another layer of complexity to deploying and managing OPF. We have to make sure the C# code is built with the right version of .NET Framework; (1) it must be up to date with security patches (including any NuGet dependencies) and (2) that version of the framework must be installed on the server.
Adding a database to the mix adds another layer of management and complexity on top of all that, and raises even more questions about CPU usage, disk usage, memory usage on the AD box, increases the attack surface, adds another component that needs to be regularly patched/updated, and will need to be monitored in case it crashes or experiences problems. It also adds a "hop" along the network chain from LSASS -> C# -> database, impairing responsiveness and making error handling that much more difficult.
- Any "hops" over the network where password data is not encrypted in transit is a security vulnerability, IMO.

Lastly, we don't actually need a database to accomplish what our current goal is. Our most important goal is to prevent keyboard walk passwords, like "asdfgh123". We figured out how to implement this very efficiently with a slight tweak to PassFiltEx.

We generated a keywalk file using kwprocessor. Then, we deleted all the combinations of length != 3. This left us with a ~10 KB password database with about 2100 lines of ASCII characters, 3 characters per line.

Then I added a new registry value to PassFiltEx called "Contains", that, if set to the DWORD value of 1 (0x01), will ignore the TokenPercentageOfPassword, and will consider the password under consideration to be "rejected" (matching the blacklist) if the password contains the given line from the blacklist (using wcscmp()). That way, all keywalks of length >= 3 contained within any password will be rejected.

This handily solves our memory and CPU issues with the linear search of the linked list in PassFiltEx. Passwords are searched in around 15 milliseconds in the absolute worst case, and most passwords are scanned in under 1 millisecond. PassFiltEx's memory footprint in lsass is about 2 MB.

We also dramatically reduced our memory usage by modifying the MAX_BLACKLIST_STRING_SIZE (here) from 128 to a much lower value that is sufficient for our needs. Allocating 256 bytes per line in the password file was just too much for us, considering that our password DB could contain thousands of lines (and it used to be much bigger). It was wasting a ton of memory.

I'll be submitting a PR to ryanries with my Contains feature soon.

ryanries commented 5 years ago

Thanks for the discussion and for the help in making this software better.

I have never tested with such a large blacklist file before. Can you point me at where I can download the > 100MB file that you are using? I'll take a look at it when I have time and see if I can do something about the memory usage. I have yet to give any thought to performance tuning before now.

allquixotic commented 5 years ago

@ryanries Based on my latest comment, the blacklist we're now using is no longer that large. However, if you're on a system with gcc, make and bash (like WSL, Cygwin, MinGW64, MacOS or Linux), you can run the following to generate the file:

git clone https://github.com/hashcat/kwprocessor
cd kwprocessor
make
./kwp --keywalk-all basechars/full.base keymaps/en-us.keymap routes/2-to-16-max-4-direction-changes.route -o /dev/stdout | gzip > keywalk.txt.gz

Basically, as far as I'm aware, the types of "lists" of passwords that are out there in common -- or at least niche use -- consist of the following:

Dictionaries (of English words, usually). This is becoming less common as password policies encourage the correct-horse-battery-staple methodology, but for legacy systems that have an artificially low password length limit (or even just a low minimum length), blocking dictionary words is extremely important. For a fairly complete dictionary you're looking at on the order of tens to hundreds of megabytes, depending on whether they include all conjugations of verbs, archaic words, and so on.
Passwords that have been compromised in the past within the organization. This is a little weird, and it tends to be a very small list the likes of which PassFiltEx can probably support just fine.
The HIBP hash dictionary. Even if you supported hashing the input password, I don't think this would be practical, because the current way of loading the entire dictionary into memory wouldn't work for the HIBP hash dictionary, as its size is on the order of tens of gigs.
Key walk databases. This is the one we're interested in at present. However, once you boil down the problem, the size of the dictionary really is only about 10 KB up to a couple megs at worst. This requires the contains functionality I described above to be effective with long passwords.

I still think PFEx will have severe memory (and maybe lookup performance / CPU time) difficulties with an English dictionary as an input list, but for a sufficiently small keywalk database and with the BADSTRING buffer size set to 16, the memory usage and search time are extremely suitable for a large-scale production AD cluster, even accounting for unanticipated growth and daily password resets by every user.

It would be great to see PFEx expand the types and sizes of databases it can support, given that the project currently has extremely rigorous and well-tested code with actual coding standards. However, before this code could get any more complex, I think we should expand PassFiltExTest into a proper unit test suite.

This is sort of a hobby for me also, as I've in a somewhat panic managed to hack together a working, stable private fork of PassFiltEx that passes muster for my day job and earns me my paycheck; contributing to upstream PassFiltEx is something I would enjoy doing slowly over time in the hopes that eventually I can throw out my private fork and just use upstream.

Problem is, to contribute upstream, my commits need to be much cleaner, my formatting needs to comply with your coding standards, and I need to consider whether my changes will help a "general" audience instead of just my company's needs. I'll hmm and haw about these factors for a while and maybe cook up some more patches, but I'm definitely not planning to implement HIBP or a new way of storing the passwords in memory just yet. I think the linked list + linear search approach is simple enough and works well enough that moving to something new (either as an option or as the new way forward) should be done carefully.

Thinking about the future (not in the next week or two, definitely), any objections to bringing in a VcPkg dependency on GLib for data structures, string functions and possibly the infrastructure to use a smarter dictionary or search algorithm in the future (without having to reinvent the wheel)?

ForumSchlampe commented 5 years ago

@ryanries You can grab big password lists form the following links https://github.com/berzerk0/Probable-Wordlists https://weakpass.com/wordlist https://github.com/danielmiessler/SecLists

BobVul commented 5 years ago

I've got a PoC of a trie-based blacklist: https://github.com/BobVul/PassFiltEx/tree/trie

Some comparisons using a 180 MB, 13 million line keywalk blacklist:

	Linked list	Table
Search time (128-char password)	3000-4000ms	1ms
Memory usage	4500 MB	430 MB
Initialisation/loading time	500s	500s

It's currently rather unpolished.

Search time is obviously greatly reduced, but even with the linked list I wasn't getting the 60-90s @allquixotic observed.
Memory usage is an order of magnitude better, but the keywalk list is pretty much a best-case scenario for a trie, since there are many shared prefixes.
- Simply using a dynamically-sized string in the linked list could achieve similar memory savings.
- The trie has a rather severe penalty of 16 bytes (pointers) per character on x64, especially apparent when there are a lot of unique strings (few shared prefixes).
- This could in theory be further optimised by compressing the trie but that greatly increases complexity, especially in a language without dynamic arrays.
Init time seems to be bottlenecked on the file read (a single byte at a time is very slow).

So I'm wondering:

Is there any interest at all in using a trie in upstream source?
Would it replace the current linked list entirely, or should the structure be pluggable and selected to suit the blacklist?

ryanries commented 5 years ago

I think that sounds amazing @BobVul .

allquixotic commented 5 years ago

I think making it pluggable is a good idea, since if you set MAX_BLACKLIST_STRING_SIZE to a very small number, the linked list is reasonably memory-efficient (granted, using an array of variable-sized strings is probably the most memory-efficient way to store the dictionary in memory...)

The trie is only comparably memory efficient when MAX_BLACKLIST_STRING_SIZE is set to its default and your passwords are pretty short. But the lookup search time is amazing for any significant dictionary length.

+1 for making it pluggable...

ryanries commented 5 years ago

I chose MAX_BLACKLIST_STRING_SIZE totally arbitrarily. It was unreasonably large. It doesn't make any sense to blacklist a character sequence that is 128 chars. Of course, dynamic sizing would be the most optimal for memory efficiency, but for now I have decreased MAX_BLACKLIST_STRING_SIZE to 32 as a compromise. That should help save memory.

ryanries / PassFiltEx

Memory consumption and performance #10