Authentication optimization in dovecot

patschi commented 1 day ago

Summary

It might be useful and beneficial to enable below auth_cache_* settings. These are suggested as part of the Performance tuning guide in dovecot's documentation.

I think following might be beneficial: https://doc.dovecot.org/2.3/settings/core/#core_setting-auth_cache_verify_password_with_worker

auth_cache_verify_password_with_worker
Default: no
Values: Boolean

The auth master process by default is responsible for the hash verifications. Setting this to yes moves the verification to auth-worker processes. This allows distributing the hash calculations to multiple CPU cores, which could make sense if strong hashes are used.

My comment: I'm not sure if the SQL-powered database login, as in mailcow, can/will take advantage. But I suppose it can't hurt and there are still some hashes calculated nonetheless.

https://doc.dovecot.org/2.3/settings/core/#core_setting-auth_cache_size

auth_cache_size
Default: 0
Values: Size

---

The authentication cache size (e.g., 10M).
The setting auth_cache_size = 0 disables use of the authentication cache.

auth_cache_ttl
Default: 1hour
Values: Time

This determines the time to live for cached data. After the TTL expires, the cached record is no longer used, unless the main database look-up returns internal failure.

My comment: In mailcow the auth_cache_size is completely disabled. To maybe save some resources for fast and periodic logins as well as don't hit MySQL every time, we could enable the authentication cache. (Might be very beneficial for services periodically pulling for emails, such as GitLab, ticket systems, etc.)

The chance of accounts being disabled and still being able to log-in for some time is - I think - worth the overall benefit. We could use 15 minutes as the default and change/mention the docs accordingly.

Motivation

Save resources, speed up authentication.

Additional context

No response

dragoangel commented 1 day ago

Is there are any issues with performance from anyone been reported? Adding cache just to optimize what doesn't have an issue, not always best choice, it can lead to negative results, especially with such stuff as auth that are critical. I would recommend to not overoptimize that not face issues. Also to remind: we use custom lua for auth.

FreddleSpl0it commented 1 day ago

I ran tests on the nightly branch because I believe it could provide significant benefits. I created a Python script to test heavy IMAP and SMTP loads.

import smtplib
import imaplib
import ssl
from concurrent.futures import ThreadPoolExecutor
import time

# Configuration
SMTP_SERVER = ''
SMTP_PORT = 587
IMAP_SERVER = ''
IMAP_PORT = 993
USERNAME = ''
PASSWORD = ''
NUM_REQUESTS = 2000  # Total number of login attempts
NUM_THREADS = 70     # Number of concurrent threads

def smtp_login():
  try:
    context = ssl._create_unverified_context()  # Disable SSL verification
    with smtplib.SMTP(SMTP_SERVER, SMTP_PORT, timeout=10) as server:
      server.ehlo()
      server.starttls(context=context) 
      server.ehlo()
      server.login(USERNAME, PASSWORD)
  except Exception as e:
    print(f"SMTP login failed: {e}")

def imap_login():
  try:
    with imaplib.IMAP4_SSL(IMAP_SERVER, IMAP_PORT) as imap_server:
      imap_server.login(USERNAME, PASSWORD)
  except Exception as e:
    print(f"IMAP login failed: {e}")

def simulate_load():
  start_time = time.time()
  print("Starting to create load")
  with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
    # Perform half SMTP and half IMAP logins simultaneously
    futures = [executor.submit(smtp_login if i % 2 == 0 else imap_login) for i in range(NUM_REQUESTS)]

    # Wait for all futures to complete
    for future in futures:
      future.result()

  print(f"Completed {NUM_REQUESTS} login attempts in {time.time() - start_time:.2f} seconds")

if __name__ == '__main__':
  simulate_load()

Each login triggers an HTTP request to a PHP script, which can be observed in the PHP-FPM container logs. After configuring the following in data/conf/dovecot/extra.conf, there were only 3 HTTP requests for 2000 login attempts.

auth_cache_size = 10M
auth_cache_ttl = 300s
auth_cache_negative_ttl = 60s

The result:

Completed 2000 login attempts in 22.42 seconds

without caching:

Completed 2000 login attempts in 37.43 seconds

dragoangel commented 1 day ago

well with tests, it's looks more good :)

patschi commented 1 day ago

That's what this issue is for: Evaluating the benefits. And I think especially big setups can profit a lot from this. I think also tools like ticketing systems, watching for new emails via IMAP, can benefit due to frequent email pulls.

After configuring the following in data/conf/dovecot/extra.conf, there were only 3 HTTP requests for 2000 login attempts.

That sounds great. I think extending cache size and TTL probably won't do much as part of a small-scale test with only very few/single user(s)?

awsumco commented 1 day ago

I would be happy to test this in a "big" setup if someone provides some kind of metrics I can graph?

patschi commented 1 day ago

I would be happy to test this in a "big" setup if someone provides some kind of metrics I can graph?

I'm not sure if this can be tracked? Potentially it's just visible for users/automation and its speed to login.

Looking more through the dovecot documentation, there is also a docs page for login-processes optimization here. (We surely should stick with the "high-security mode"-approach)

In the current setup, we do use: https://github.com/mailcow/mailcow-dockerized/blob/bd9f4ba0a57a159939760ecd319f2d44abf6b27a/data/conf/dovecot/dovecot.conf#L126-L153

The docs state:

Since one login process can handle only one connection, the service’s process_limit setting limits the number of users that can be logging in at the same time (defaults to default_process_limit=100).

I do find the current value of process_limit = 10000 quite extreme in comparison. This would allow 10 000 concurrent logins just for IMAP, pretty surely causing out-of-memory. Same for POP3, so double the amount. I would think even 1000 being high enough.

But that aside: What the most interesting bit on the doc is...

To avoid startup latency for new client connections, set process_min_avail to higher than zero. That many idling processes are always kept around waiting for new connections.

We currently do not utilize process_min_avail at all (neither for IMAP nor POP3). I think login speed could even benefit more at a minimal memory cost. I think using process_min_avail = 2 would be fine for IMAP for smaller setups and maybe process_min_avail = 1 for POP3 (for whoever is still using this). Bigger setups can increase it by using the data/conf/dovecot/extra.conf.

What you think, @FreddleSpl0it? Would you mind repeating your tests on the same setup with above? (To have comparable results to your previous ones)

patschi commented 18 hours ago

I ran tests on the nightly branch because I believe it could provide significant benefits. I created a Python script to test heavy IMAP and SMTP loads.

Noteworthy, that the tool "only" opens 1000 IMAP connections (splits 2000 50/50 to IMAP and SMTP) which benefit from the changes) while SMTP does not. So with 2000 IMAP connections it might benefit even more.

I'd also be curious if setting auth_cache_verify_password_with_worker improves it even further.

The final configuration I'd suggest:

auth_cache_size = 10M
auth_cache_ttl = 300s
auth_cache_negative_ttl = 60s
auth_cache_verify_password_with_worker = yes

service imap-login {
  service_count = 1
  process_min_avail = 2
  process_limit = 10000
  vsz_limit = 1G
  user = dovenull
  inet_listener imap_haproxy {
    port = 10143
    haproxy = yes
  }
  inet_listener imaps_haproxy {
    port = 10993
    ssl = yes
    haproxy = yes
  }
}
service pop3-login {
  service_count = 1
  process_min_avail = 1
  vsz_limit = 1G
  inet_listener pop3_haproxy {
    port = 10110
    haproxy = yes
  }
  inet_listener pop3s_haproxy {
    port = 10995
    ssl = yes
    haproxy = yes
  }
}

mailcow / mailcow-dockerized