rdmorganiser / rdmo

A tool to support the planning, implementation, and organization of research data management.
https://rdmorganiser.github.io
Apache License 2.0
105 stars 49 forks source link

Switch to even faster password hasher in testing config #1072

Closed afuetterer closed 4 months ago

afuetterer commented 4 months ago

One part of the huge "dev setup refactoring of 2023" was switching to a faster password hashing in the testing config: https://github.com/rdmorganiser/rdmo/pull/664/commits/56c6e0ea1c04749dac85e9102be14a0b806c163b

There are so many individual tests in the test suite (at the moment 23824) and most of them need an authenticated user. I propose changing the password hasher again.

There is UnsaltedMD5PasswordHasher in the django.contrib.auth.hashers, which is of course very insecure, but in testing that does not matter. It is much faster then the already fast MD5PasswordHasher. Sadly the UnsaltedMD5PasswordHasher is deprecated and will be removed in Django 5.1. So it would only be a temporary solution, until we reach Django 5.1.

@jochenklar already mentioned django-plaintext-password last time. Which sounds like a joke, but it is the fastest password hasher there is in my opinion. This would mean another dependency in the testing group, which I am fine with.

Long term I want to replace most of the client.login(username, password) with client.force_login(user), which does not use the authentication system at all. But that is a bit of work.

What do you think? The time difference in a whole test suite run will be significant.

# benchmark.py
import time

from django.contrib.auth.hashers import (
    MD5PasswordHasher,
    SHA1PasswordHasher,
    UnsaltedMD5PasswordHasher,
    UnsaltedSHA1PasswordHasher,
)

from plaintext_password import PlaintextPasswordHasher

password = "password-for-testing-the-password-hashers"

sha1 = SHA1PasswordHasher()
md5 = MD5PasswordHasher()
unsalted_sha1 = UnsaltedSHA1PasswordHasher()
unsalted_md5 = UnsaltedMD5PasswordHasher()
plaintext = PlaintextPasswordHasher()

hashers = [sha1, md5, unsalted_sha1, unsalted_md5, plaintext]
hashing_times = {}

for hasher in hashers:
    start_time = time.time()
    for _ in range(100_000):
        hasher.encode(password, hasher.salt())
    end_time = time.time()
    hashing_times[hasher] = end_time - start_time

print("Hashing times (seconds):")
for hasher, duration in hashing_times.items():
    print(f"{hasher.algorithm}: {duration:.5f}")
$ python benchmark.py
Hashing times (seconds):
sha1: 3.23304
md5: 3.26118
unsalted_sha1: 0.09653
unsalted_md5: 0.07505
plaintext: 0.02371
afuetterer commented 4 months ago

Alright, I see that the actual benefit is far from significant. Let's close this.

jochenklar commented 4 months ago

Hey, sorry I did not answer. I think I tested that back then, when you gave us the tip with the md5 hashes. I think with md5 or sha1 or plain the overhead of just logging in is the dominant part. With the default algorithm its probably not so much the hashing algorithm, but the 1000 rotations or so which make the difference.

afuetterer commented 4 months ago

Yes, no problem. I thought it might make a difference.

Another approach would to .force_login() users instead of .login(username, password). But md5 might be fast enough already.