Open awsumco opened 1 year ago
@MAGICCC @mkuron Feedback to this approach?
It doesn't really solve the problem. Now you might need 3000+ domains to cross the limit, but you will still cross it at some point. I'm sure there are configuration options that can be changed to raise the limit. though unfortunately @awsumco's log messages are very unspecific so it's hard to know which ones. On https://doc.dovecot.org/settings/core/, I found config_cache_size
(Default: 1M, The maximum size of the in-memory configuration cache. The cache should be large enough to allow keeping the full, parsed Dovecot configuration in memory. The default is almost always large enough, unless your system has numerous large TLS certificates in the configuration.) and default_vsz_limit
(Default: 256M, The default virtual memory size limit for service processes.). I am pretty sure both of these should be tuned on large Mailcow installs, perhaps dynamically depending on the number of SNI domains.
I will look into config_cache_size
(this makes sense to me) and get back with more information here. Also I agree the log was not very helpful at all and increasing verbosity did not reveal any more info. I only stumbled accost the sni.conf while looking at at other dovecot server setup with a huge amount of domains hosted, made changes and to my surprise the service started.
As for reading the same error at some stage yes I agree it will happen, however I have picked up other scaling problems with MailCow which I am happy to share if anyone is interested. In saying that I will set a max domains limit on the setup.
I suppose you could look at this contribution as a neater way to setup the sni.conf file.
I replicated the issue with 2000 Domains and Dovecot says
Nov 20 12:03:01 cae2cdf0b28b dovecot: config: Fatal: master: service(config): child 2358 returned error 83 (Out of memory (service config { vsz_limit=1024 MB }, you may need to increase it) - set CORE_OUTOFMEM=1 environment to get core dump)
In dovecot.conf
i've added vsz_limit = 2048 MB
to service config
service config {
vsz_limit = 2048 MB
unix_listener config {
user = root
group = vmail
mode = 0660
}
}
The out of memory log is gone but the config service now takes to long to return the dovecot configuration which results in
Nov 20 12:14:47 cae2cdf0b28b dovecot: managesieve-login: Error: conn unix:stats-writer (pid=2162,uid=0): Timeout waiting for handshake response
Nov 20 12:14:47 cae2cdf0b28b dovecot: managesieve-login: Error: conn unix:stats-writer (pid=2162,uid=0): Timeout waiting for handshake response
Nov 20 12:14:47 cae2cdf0b28b dovecot: replicator: Error: conn unix:/run/dovecot/stats-writer (pid=2162,uid=0): Timeout waiting for handshake response
Nov 20 12:14:52 cae2cdf0b28b dovecot: anvil: Fatal: Error reading configuration: read(/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment)
Nov 20 12:14:52 cae2cdf0b28b dovecot: master: Error: service(anvil): command startup failed, throttling for 2.000 secs
Nov 20 12:14:52 cae2cdf0b28b dovecot: stats: Fatal: Error reading configuration: read(/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment)
Nov 20 12:14:52 cae2cdf0b28b dovecot: master: Error: service(stats): command startup failed, throttling for 2.000 secs
Nov 20 12:14:57 cae2cdf0b28b dovecot: managesieve-login: Fatal: Error reading configuration: read(/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment)
Nov 20 12:14:57 cae2cdf0b28b dovecot: master: Error: service(managesieve-login): command startup failed, throttling for 2.000 secs
Nov 20 12:14:57 cae2cdf0b28b dovecot: managesieve-login: Fatal: Error reading configuration: read(/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment)
Nov 20 12:14:57 cae2cdf0b28b dovecot: replicator: Fatal: Error reading configuration: read(/run/dovecot/config) failed: read(size=8192) failed: Interrupted system call - Also failed to read config by executing doveconf: /run/dovecot/config is a UNIX socket (path is from CONFIG_FILE environment)
@mkuron i think in such big setups we should offload ssl termination to nginx.
i think in such big setups we should offload ssl termination to nginx.
Good idea.
Should we use nginx as a TCP or IMAP proxy to offload ssl termination? I'm not quite sure if there are any downsides to using nginx as an IMAP proxy.
Should we use nginx as a TCP or IMAP proxy to offload ssl termination?
It would have to be an IMAP proxy so the remote IP address can be passed through.
I'm not quite sure if there are any downsides to using nginx as an IMAP proxy.
I could imagine it having load issues with large numbers of long-lived connections, e.g. for IMAP IDLE. But we would have to test it to find out if that is even relevant when compared to the load on Dovecot.
Contribution guidelines
I've found a bug and checked that ...
Description
On a Mailcow setup where there is a large amount of hosted domains (1500+) dovecot fails to start due to sni.conf either having to many local_name enties or file size is to large to load into memory.
The "fix" was to rewrite the local_name to include all domains in one line from the domains file that is created by ACME.
For example the way that sni.conf outputs the following line's of config code:
I suggest that all domains are placed in line, thus the certificate is only loaded once per hosted domains like below.
Doing the above, got dovecot started and working.
The pro's being:
Logs:
Steps to reproduce:
Which branch are you using?
master
Operating System:
Debian 10.13
Server/VM specifications:
16Gb / 4 CPU Cores
Is Apparmor, SELinux or similar active?
no
Virtualization technology:
Hyper-V
Docker version:
24.0.6
docker-compose version or docker compose version:
v2.21.0
mailcow version:
2023-10a
Reverse proxy:
NA
Logs of git diff:
Logs of iptables -L -vn:
Logs of ip6tables -L -vn:
Logs of iptables -L -vn -t nat:
Logs of ip6tables -L -vn -t nat:
DNS check: