pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 952 forks source link

Consider the future of `passlib` and password hashing/upgrading #15454

Open miketheman opened 6 months ago

miketheman commented 6 months ago

We use passlib in warehouse as part of our user account management service:

https://github.com/pypi/warehouse/blob/26a3446ada6c2db27e6e608d81508ca25018f389/warehouse/accounts/services.py#L79-L93

The TL;DR of what this does is allows user password hash algorithms to evolve over time, and as users log in with their passwords they are confirmed and replaced with the newer (presumably more secure) hash algorithms, preventing the user from needing to reset a password only to get the latest and greatest algorithm.

passlib hasher docs can be found here: https://passlib.readthedocs.io/en/stable/lib/passlib.hash.html

The most recent release of passlib was in 2020, and raises warnings for using crpyt, which will turn into breakages under Python 3.13, so this is not yet a blocker, it's something we should consider long before it becomes one.

Here's an issue for maintenance status that has yet to be resolved, either by nominating new maintainers, or some other resolution.

In the interim another contender has emerged - pwdlib (author launch blog post), which appears to have argon2 and bcrypt support.

So in theory, we could leverage pwdlib and continue to leverage the upgradability-behavior, however we'd still need to account for folks that have yet to log in in modern times and the lack of older algo support in pwdlib. pwdlib also does not yet support disable() or is_enabled() which we use today, but could be replaced by using the boolean flag User.is_active or such.

Alternately, since it's not urgent yet, we can continue to observe the evolving space around passlib and hope that a new maintenance team arises before it becomes a severe issue.


Some SQL counting:

```sql warehouse=> SELECT CASE WHEN password LIKE '$argon2%' THEN 'argon2' WHEN password LIKE '$bcrypt-sha256%' THEN 'bcrypt_sha256' WHEN password LIKE '$2b$%' THEN 'bcrypt' WHEN password LIKE 'bcrypt$%' THEN 'django_bcrypt' WHEN password LIKE 'spammer' THEN 'disabled' WHEN password LIKE '!' THEN 'disabled' ELSE 'other' END AS hash_type, COUNT(*) FROM users GROUP BY hash_type ORDER BY COUNT(*) DESC; hash_type | count ---------------+-------- argon2 | 671847 bcrypt_sha256 | 76452 disabled | 28087 django_bcrypt | 10930 (4 rows) ```
JacobCoffee commented 2 months ago

Some relevant things: https://github.com/canonical/cloud-init/issues/4791

frankie567 commented 4 weeks ago

Hey, pwdlib maintainer here 👋

I understand this is not a critical priority for warehouse at the moment, but if you need specific features and/or algorithms so you can replace passlib, I would be glad to discuss it 🙂

miketheman commented 3 weeks ago

Hi @frankie567 ! Thanks for asking.

We currently use passlib "lightly" in warehouse. I linked above the to CryptContext in use - specifically the algos still in use are listed there - I don't think pwdlib has support yet for the algos we have in use today.

We also use the verify_and_update() function, it does most of the heavy lifting:

https://github.com/pypi/warehouse/blob/c59d8bbe3918f586a9e37959451882be742a1849/warehouse/accounts/services.py#L225-L230

Other functions we use are hash(), verify() which I think pwdlib supports, disable(), and is_enabled() which I don't think are supported yet.

frankie567 commented 2 weeks ago

Thank you for those details, Mike 👍

I confirm pwdlib supports verify_and_update, hash and verify.

I'll have a look at the algorithms you use. Regarding the enable/disable feature, I'll consider it, even though currently I believe this is something that should be handled at user's level (with a flag like you suggest) rather than at password's level.