hardware / mailserver

:warning: UNMAINTAINED - Simple and full-featured mail server using Docker
https://store.docker.com/community/images/hardware/mailserver
MIT License
1.29k stars 322 forks source link

Unexpected supervisor crash, mailserver down. #372

Closed bkraul closed 4 years ago

bkraul commented 5 years ago

Classification

Please delete options that are not relevant.

Reproducibility

Random stoppage of the mailserver with the following significant log entries:

mailserver      | 2019-04-23T04:30:16.943918+00:00 mail dovecot: master: Warning: Killed with signal 15 (by pid=521 uid=0 code=kill)
mailserver      | s6-svc: fatal: unable to control /services/rsyslogd: supervisor not listening
mailserver      | s6-svc: fatal: unable to control /services/unbound: supervisor not listening

It would seem as if supervisor is crashing and not coming back.

Docker information

docker info
----
Containers: 39
 Running: 39
 Paused: 0
 Stopped: 0
Images: 73
Server Version: 18.09.5
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-47-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.852GiB
Name: ********
ID: ******************************
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support
----
docker images hardware/mailserver --digests --filter "dangling=false"
REPOSITORY            TAG                 DIGEST                                                                    IMAGE ID            CREATED             SIZE
hardware/mailserver   1.1-latest          sha256:3a5f3349cef7babb07ad51fea1f4b66bce2fd1bc7c1aca69017cf81647a80994   62394a202d82        9 days ago          380MB

Description

Briefly describe the problem you are having in a few lines.

Steps to reproduce

  1. Normal operation of mailserver.

Expected results

No unexpected supervisor crash.

Actual results

Supervisor crashes, docker container restart is needed.

Debugging information

docker logs mailserver
(relevant)
mailserver      | 2019-04-23T04:21:15.640941+00:00 mail postfix/anvil[2351]: statistics: max connection rate 2/60s for (smtps:***.**.**.**) at Apr 23 04:17:47
mailserver      | 2019-04-23T04:21:15.640986+00:00 mail postfix/anvil[2351]: statistics: max connection count 1 for (smtp:***.***.***.***) at Apr 23 04:15:22
mailserver      | 2019-04-23T04:21:15.641024+00:00 mail postfix/anvil[2351]: statistics: max cache size 2 at Apr 23 04:17:44
mailserver      | 2019-04-23T04:23:21.741196+00:00 mail postfix/smtpd[2363]: warning: hostname hosting-by.directwebhost.org does not resolve to address **.***.***.**
mailserver      | 2019-04-23T04:23:21.741322+00:00 mail postfix/smtpd[2363]: connect from unknown[**.***.***.**]
mailserver      | 2019-04-23T04:23:22.626798+00:00 mail postfix/smtpd[2363]: lost connection after EHLO from unknown[**.***.***.**]
mailserver      | 2019-04-23T04:23:22.627014+00:00 mail postfix/smtpd[2363]: disconnect from unknown[**.***.***.**] ehlo=1 commands=1
mailserver      | 2019-04-23T04:23:36.835359+00:00 mail postfix/smtpd[2363]: connect from unknown[185.234.219.64]
mailserver      | 2019-04-23T04:23:37.243754+00:00 mail postfix/smtpd[2363]: lost connection after AUTH from unknown[185.234.219.64]
mailserver      | 2019-04-23T04:23:37.244038+00:00 mail postfix/smtpd[2363]: disconnect from unknown[185.234.219.64] ehlo=1 auth=0/1 commands=1/2
mailserver      | 2019-04-23T04:25:32.469899+00:00 mail postfix/smtpd[2366]: connect from lputeaux-656-1-115-122.w193-251.abo.wanadoo.fr[193.251.18.122]
mailserver      | 2019-04-23T04:25:32.766128+00:00 mail postfix/smtpd[2366]: lost connection after AUTH from lputeaux-656-1-115-122.w193-251.abo.wanadoo.fr[193.251.18.122]
mailserver      | 2019-04-23T04:25:32.766537+00:00 mail postfix/smtpd[2366]: disconnect from lputeaux-656-1-115-122.w193-251.abo.wanadoo.fr[193.251.18.122] helo=1 auth=0/1 commands=1/2
mailserver      | 2019-04-23T04:28:52.964062+00:00 mail postfix/anvil[2365]: statistics: max connection rate 1/60s for (smtp:**.***.***.**) at Apr 23 04:23:21
mailserver      | 2019-04-23T04:28:52.964102+00:00 mail postfix/anvil[2365]: statistics: max connection count 1 for (smtp:**.***.***.**) at Apr 23 04:23:21
mailserver      | 2019-04-23T04:28:52.964126+00:00 mail postfix/anvil[2365]: statistics: max cache size 2 at Apr 23 04:23:36
mailserver      | 2019-04-23T04:29:38.877835+00:00 mail postfix/smtpd[2368]: connect from unknown[185.234.219.64]
mailserver      | 2019-04-23T04:29:39.236061+00:00 mail postfix/smtpd[2368]: lost connection after AUTH from unknown[185.234.219.64]
mailserver      | 2019-04-23T04:29:39.236707+00:00 mail postfix/smtpd[2368]: disconnect from unknown[185.234.219.64] ehlo=1 auth=0/1 commands=1/2
mailserver      | 2019-04-23T04:30:16.943918+00:00 mail dovecot: master: Warning: Killed with signal 15 (by pid=521 uid=0 code=kill)
mailserver      | [INFO] Traefik SSL certificates not used
mailserver      | [INFO] Let's encrypt live directory found
mailserver      | [INFO] Using /etc/letsencrypt/live/mail.mydomain.com folder
mailserver      | [INFO] Found DKIM key pair for domain *******.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *******.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain ******.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain ********.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain ******.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain ******.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain ******.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *****.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *****.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *****.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *****.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *****.com - skip creation
mailserver      | [INFO] Found DKIM key pair for domain *****.com - skip creation
mailserver      | [INFO] Override parameter in main.cf : smtpd_sender_restrictions=reject_non_fqdn_sender,reject_unknown_sender_domain,permit_mynetworks,reject_sender_login_mismatch,reject_rhsbl_sender dbl.spamhaus.org
mailserver      | [INFO] Override parameter in main.cf : relayhost = [**.********.com]:587
mailserver      | [INFO] Override parameter in main.cf : smtp_sasl_auth_enable = yes
mailserver      | [INFO] Override parameter in main.cf : smtp_sasl_security_options = noanonymous
mailserver      | [INFO] Override parameter in main.cf : smtp_sasl_password_maps = hash:/etc/postfix/custom/sasl_passwd
mailserver      | [INFO] Override service parameter in master.cf : submission/inet/smtpd_client_restrictions=permit_mynetworks,permit_sasl_authenticated,reject
mailserver      | [INFO] Override service parameter in master.cf : smtps/inet/smtpd_client_restrictions=permit_mynetworks,permit_sasl_authenticated,reject
mailserver      | [INFO] Custom Postfix configuration file loaded
mailserver      | [INFO] MariaDB/PostgreSQL hostname not found in /etc/hosts
mailserver      | [INFO] Container IP found, adding a new record in /etc/hosts
mailserver      | [INFO] Redis hostname not found in /etc/hosts
mailserver      | [INFO] Container IP found, adding a new record in /etc/hosts
mailserver      | [INFO] Debug mode is disabled
mailserver      | [INFO] ClamAV is disabled, service will not start
mailserver      | [INFO] Fetchmail forwarding is disabled
mailserver      | [INFO] Automatic GPG encryption is disabled
mailserver      | [INFO] ManageSieve protocol is enabled
mailserver      | [INFO] DKIM/ARC signing is enabled
mailserver      | [INFO] Greylisting policy is enabled
mailserver      | [INFO] Ratelimiting policy is disabled
mailserver      | [INFO] POP3 protocol is enabled
mailserver      | [INFO] Unbound DNS resolver is enabled
mailserver      | [INFO] clamav-unofficial-sigs is disabled (user configuration not found)
mailserver      | s6-svc: fatal: unable to control /services/rsyslogd: supervisor not listening
mailserver      | s6-svc: fatal: unable to control /services/unbound: supervisor not listening

Configuration (docker-compose.yml, traefik.toml...etc)

version: '3'

services:
  mailserver:
    image: hardware/mailserver:1.1-latest
    container_name: mailserver
    restart: always
    domainname: mydomain.com
    hostname: mail
    ports:
      - '25:25'
      - '465:465'
      - '587:587'
      - '993:993'
      - '995:995'
      - '4190:4190'
    environment:
      - DBHOST=vps.mydomain.com
      - DBUSER=*********
      - DBPASS=********
      - RSPAMD_PASSWORD=**********
      - VIRTUAL_HOST=spam.mydomain.com
      - VIRTUAL_PORT=11334
      - DISABLE_CLAMAV=true
      - RELAY_NETWORKS=192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
      - ADD_DOMAINS=domain1.com,domain2.com,domain3.com
      - ENABLE_POP3=true
    volumes:
      - /root/docker-persist/mailserver/mail:/var/mail
      - /root/docker-persist/mailserver/mail/postfix/custom:/etc/postfix/custom
      - /root/docker-persist/mailserver/mail/ssl/letsencrypt:/etc/letsencrypt
    depends_on:
      - redis
    networks:
      - reverse-proxy
      - default
  postfixadmin:
    image: hardware/postfixadmin
    container_name: postfixadmin
    restart: always
    domainname: mydomain.com
    hostname: mail
    environment:
      - DBHOST=vps.mydomain.com
      - DBUSER=*******
      - DBPASS=*******
      - PASSVAL_MIN_DIGIT=1
      - VIRTUAL_HOST=postfixadmin.mydomain.com
      - VIRTUAL_PORT=8888
    networks:
      - default
      - reverse-proxy
  rainloop:
    image: hardware/rainloop
    container_name: rainloop
    restart: always
    volumes:
      - /root/docker-persist/mailserver/rainloop:/rainloop/data
    environment:
      - VIRTUAL_HOST=domain1,com,domain2.com,domain3.com
      - VIRTUAL_PORT=8888
      - LOG_TO_STDOUT=true
    depends_on:
      - mailserver
    networks:
      - default
      - reverse-proxy
  redis:
    image: redis:4.0-alpine
    container_name: mailserver_redis
    restart: always
    command: redis-server --appendonly yes
    volumes:
      - /root/docker-persist/mailserver/redis/db:/data
    sysctls:
      - net.core.somaxconn=511
    networks:
      - default

networks:
  reverse-proxy:
    external:
      name: reverse-proxy
bkraul commented 5 years ago

This happened again, with the exact same situation. This time, I tried to use a docker restart mailserver command to restart the mailserver container. It failed twice with the exact same error, but on the third time, it worked OK and it resumed operation. I am pretty sure this is supervisor crashing and not being able to be restarted. I just wish I knew how to bring it back automatically when it crashes.

sknight80 commented 5 years ago

Hi @bkraul ,

I usually, run the following command: docker stop mailserver && docker start mailserver . if I made changes in the docker file then docker stop mailserver && docker rm mailserver && docker-compose up -d mailserver

I hope this helps you for restarting your mailserver container safely. Regarding the problem with the crash, are you seeing something in your host machine logs (like in the message.log, systlog)?

bkraul commented 5 years ago

@sknight80 Yes, it is the same number of steps I follow. Checked the host logs. I could not find anything remotely related.

hardware commented 5 years ago

This issue has been marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bkraul commented 5 years ago

Well, not sure what more activity it can have beyond being reported. This continues to happen. I am suspecting it is load-based, as I am seeing multiple denied relay attempts when it crashes. The problem is not that any service crashes, though, the problem is that supervisor crashes and it does not listen anymore so nothing can be restarted.

hardware commented 5 years ago

Need further investigation on what could cause instabilities. Upload all information about processes and logs here before and after crashes occur. Try with another hardware & host computer too.

hardware commented 5 years ago

@bkraul can you test again with the new image under Debian Buster (use 1.1-latest tag) ? I have updated s6 (skarnet's small supervision suite) as well.

bkraul commented 5 years ago

Just updated. Will monitor and let you know. I wish this was easily reproducible but as I said, it seems load-based.

budimanjojo commented 5 years ago

I just had a similar problem and a container restart fix it. Here's my log with real domain name replaced:

Attaching to rainloop, postfixadmin, mailserver
postfixadmin    | PHP 7.3.9 Development Server started at Sun Oct 20 07:02:38 2019
postfixadmin    | [Sun Oct 20 07:02:52 2019] 172.20.0.2:46092 [302]: /
postfixadmin    | [Sun Oct 20 07:02:52 2019] 172.20.0.2:46118 [200]: /login.php
postfixadmin    | [Sun Oct 20 07:02:52 2019] 172.20.0.2:46156 [200]: /login.php
postfixadmin    | [Sun Oct 20 07:02:52 2019] 172.20.0.2:46162 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 07:02:52 2019] 172.20.0.2:46160 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 07:02:52 2019] 172.20.0.2:46202 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 07:02:53 2019] 172.20.0.2:46226 [200]: /images/mail_bg.gif
postfixadmin    | [Sun Oct 20 07:02:53 2019] 172.20.0.2:46252 [200]: /images/favicon.ico
postfixadmin    | [Sun Oct 20 07:02:55 2019] 172.20.0.2:46496 [302]: /login.php
postfixadmin    | [Sun Oct 20 07:02:55 2019] 172.20.0.2:46550 [200]: /main.php
postfixadmin    | [Sun Oct 20 07:02:55 2019] 172.20.0.2:46588 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 07:02:55 2019] 172.20.0.2:46590 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 07:02:56 2019] 172.20.0.2:46608 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 08:26:12 2019] 172.20.0.2:54316 [302]: /
postfixadmin    | [Sun Oct 20 08:26:12 2019] 172.20.0.2:54322 [200]: /login.php
postfixadmin    | [Sun Oct 20 08:26:12 2019] 172.20.0.2:54332 [200]: /images/favicon.ico
postfixadmin    | [Sun Oct 20 09:59:51 2019] 172.20.0.2:40636 [302]: /
postfixadmin    | [Sun Oct 20 09:59:52 2019] 172.20.0.2:40652 [200]: /login.php
postfixadmin    | [Sun Oct 20 09:59:52 2019] 172.20.0.2:40670 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 09:59:52 2019] 172.20.0.2:40672 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 09:59:52 2019] 172.20.0.2:40700 [200]: /login.php
postfixadmin    | [Sun Oct 20 09:59:52 2019] 172.20.0.2:40728 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 09:59:52 2019] 172.20.0.2:40738 [200]: /images/mail_bg.gif
postfixadmin    | [Sun Oct 20 10:00:03 2019] 172.20.0.2:41746 [302]: /login.php
postfixadmin    | [Sun Oct 20 10:00:04 2019] 172.20.0.2:42024 [200]: /main.php
postfixadmin    | [Sun Oct 20 10:00:04 2019] 172.20.0.2:42064 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 10:00:04 2019] 172.20.0.2:42066 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 10:00:04 2019] 172.20.0.2:42080 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 10:00:09 2019] 172.20.0.2:42570 [200]: /list.php?table=admin
postfixadmin    | [Sun Oct 20 10:00:09 2019] 172.20.0.2:42598 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 10:00:09 2019] 172.20.0.2:42600 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 10:00:09 2019] 172.20.0.2:42620 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 10:00:17 2019] 172.20.0.2:43392 [200]: /list-virtual.php
postfixadmin    | [Sun Oct 20 10:00:17 2019] 172.20.0.2:43426 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 10:00:17 2019] 172.20.0.2:43428 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 10:00:17 2019] 172.20.0.2:43448 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 10:00:24 2019] 172.20.0.2:44088 [200]: /edit.php?table=mailbox&edit=myname%40mydomain.com
postfixadmin    | [Sun Oct 20 10:00:24 2019] 172.20.0.2:44104 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 10:00:24 2019] 172.20.0.2:44106 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 10:00:24 2019] 172.20.0.2:44118 [200]: /css/calendar.css
postfixadmin    | [Sun Oct 20 10:00:28 2019] 172.20.0.2:44454 [200]: /list-virtual.php
postfixadmin    | [Sun Oct 20 10:24:43 2019] 172.20.0.2:50262 [302]: /
postfixadmin    | [Sun Oct 20 10:24:44 2019] 172.20.0.2:50276 [200]: /login.php
postfixadmin    | [Sun Oct 20 10:24:44 2019] 172.20.0.2:50292 [200]: /css/default.css
postfixadmin    | [Sun Oct 20 10:24:44 2019] 172.20.0.2:50294 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 10:46:43 2019] 172.20.0.2:45142 [302]: /
postfixadmin    | [Sun Oct 20 10:46:43 2019] 172.20.0.2:45158 [200]: /login.php
postfixadmin    | [Sun Oct 20 10:46:43 2019] 172.20.0.2:45184 [200]: /images/logo-default.png
postfixadmin    | [Sun Oct 20 10:46:43 2019] 172.20.0.2:45186 [200]: /css/default.css
mailserver      | [INFO] MariaDB/PostgreSQL hostname not found in /etc/hosts
mailserver      | [INFO] Container IP found, adding a new record in /etc/hosts
mailserver      | [INFO] Redis hostname not found in /etc/hosts
mailserver      | [INFO] Container IP found, adding a new record in /etc/hosts
mailserver      | [INFO] Search for SSL certificates generated by Traefik
mailserver      | [INFO] acme.json found with ACME v2 format, dumping into pem files
mailserver      | [INFO] Let's encrypt wildcard certificate found
mailserver      | [INFO] Setting up container
mailserver      | [INFO] Found DKIM key pair for domain mydomain.com - skip creation
mailserver      | [INFO] Debug mode is disabled
mailserver      | [INFO] ClamAV is disabled, service will not start
mailserver      | [INFO] Fetchmail forwarding is disabled
mailserver      | [INFO] Automatic GPG encryption is disabled
mailserver      | [INFO] ManageSieve protocol is enabled
mailserver      | [INFO] DKIM/ARC signing is enabled
mailserver      | [INFO] Greylisting policy is enabled
mailserver      | [INFO] Ratelimiting policy is disabled
mailserver      | [INFO] POP3 protocol is disabled
mailserver      | [INFO] Unbound DNS resolver is enabled
mailserver      | [INFO] clamav-unofficial-sigs is disabled (user configuration not found)
mailserver      | [INFO] Finished container setup
mailserver      | [INFO] Starting services
mailserver      | s6-svc: fatal: unable to control /services/rsyslogd: supervisor not listening
mailserver      | [INFO] Checking for watchable SSL certificates
mailserver      | [INFO] Watching /etc/letsencrypt/acme
^CTraceback (most recent call last):
PeerD commented 5 years ago

This issue is reproducible at my test system, but only with about 15% probability. So if I run the test suite 10 times, it normally happens once or twice.

I am not seeing anything in the logs, but that is probably because the parts responsible have logging disabled or as rsyslogd itself is not running, is not logged?

But I can see some processes not running that normally should be running in that container. These are the processes running in the normal container, but not the crashed one:

root      1385  0.0  0.0 156096  3124 ?        Ssl  15:20   0:00 rsyslogd -n -f /etc/rsyslog/rsyslog.conf
postfix   1541  0.0  0.1  53092 13888 ?        S    15:22   0:00 smtpd -n smtp -t inet -u -o stress= -s 2
postfix   1542  0.0  0.0  43496  7252 ?        S    15:22   0:00 proxymap -t unix -u
postfix   1544  0.0  0.0  52300  9984 ?        S    15:22   0:00 trivial-rewrite -n rewrite -t unix -u
postfix   1545  0.0  0.1  52536 10024 ?        S    15:22   0:00 cleanup -z -t unix -u
postfix   1546  0.0  0.0  43784  7380 ?        S    15:22   0:00 lmtp -t unix -u
postfix   1572  0.0  0.0  43492  7128 ?        S    15:22   0:00 scache -l -t unix -u
postfix   1586  0.0  0.1  53228 13792 ?        S    15:22   0:00 smtpd -n submission -t inet -u -o stress= -s 2 -o syslog_name=postfix/submission-custom -o smtpd_tls_security_level=encrypt -o smtpd_tls_ciphers=high -o smtpd_client_restrictions=permit_sasl_authenticated,reject -o cleanup_
postfix   1587  0.0  0.1  52992 10732 ?        S    15:22   0:00 cleanup -z -n authclean -t unix -u -o syslog_name=postfix/authclean -o header_checks=pcre:/etc/postfix/header_checks

Maybe this helps to narrow the problem in. If someone can tell me how to enable more verbous logs, I can test that aswell.

sknight80 commented 5 years ago

Sidenote: I am working on a base image update to the latest version of s6 for example.

Related to your question: I don't know yet, but I can Google it

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.