mailcow / mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕
https://mailcow.email
GNU General Public License v3.0
8.96k stars 1.18k forks source link

Suddenly uses SAN domain for CN #4031

Closed strarsis closed 2 years ago

strarsis commented 3 years ago

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

Summary

Today my mail programs/apps and web browser all show a certificate error/warning when visiting the mailcow server. The certificate common name suddenly deviated from the FQDN (here called mail.primary.tld, set as MAILCOW_HOSTNAME) of the mail server and was one of the additional names (here called mail.additional.tld, set using ADDITIONAL_SAN=mail.*). I updated to latest mailcow-dockerized and rebooted the server, and the acme mailcow container performed some automatic tasks and finally the certificate common name was fixed:

Found orphaned certificate: mail.additional.tld - archiving it at /var/lib/acme/backups/mail.additional.tld/

So for whatever reason mailcow suddenly started using some additional domain instead of the primary domain.

I also got a short network issue/downtime warning (~fiveminutes) from my cloud provider for the mail-server on that same day and hour, so maybe a short network issue triggered this bug. But a network downtime shouldn't cause this behaviour (signing the certificate not with the proper common name).

Logs

Found orphaned certificate: mail.additional.tld - archiving it at /var/lib/acme/backups/mail.additional.tld/

Reproduction

Probably by disturbing the network connection/internet of the mail server, while it updates the certificate? This seems to be the most likely cause of the bug.

System information

Question Answer
My operating system Ubuntu 18.04.5 LTS
Is Apparmor, SELinux or similar active? yes (apparmor module is loaded)
Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported Cloud provider using KVM
Server/VM specifications (Memory, CPU Cores) 8 GB; 2 vCPU cores
Docker Version (docker version) Docker version 20.10.5, build 55c4c88
Docker-Compose Version (docker-compose version) 1.28.5, build c4eb3a1f
Reverse proxy (custom solution) No
github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

strarsis commented 3 years ago

Not stale!

andryyy commented 3 years ago

Can you please post the acme-mailcow logs showing why the primary name was skipped? Full logs, please.

strarsis commented 3 years ago

@andryyy: The problem is that this happened about two months ago. I will have to retrieve these logs.

andryyy commented 3 years ago

Did it not happen again?

There was an issue with Docker 20.x and a non-updated mailcow. Updating mailcow fixed it.

strarsis commented 3 years ago

@andryyy: No it, hadn't happened again afterwards and I updated mailcow afterwards. OK, so this issue can be closed for now. If this issue happens again I will re-open it with more log details.

strarsis commented 2 years ago

@andryyy: Have this issue again after the mailcow June update :(

Forced a renewal, found this:

Found domains: mail.extra-domain1.com, mail.extra-domain2.com, mail.extra-domain3.com, mail.extra-domain4.com

The FQDN domain is not listed there! As certificate name, one of these additional domains is used instead.

Also this is logged:

Found orphaned certificate: <FQDN> - archiving it at /var/lib/acme/backups/<FQDN>/

My guess is that if check_domain ${MAILCOW_HOSTNAME}; then in acme.sh didn't pass for whatever reason: https://github.com/mailcow/mailcow-dockerized/blob/master/data/Dockerfiles/acme/acme.sh#L253-L254

It lists

Found AAAA record for <FQDN>: <FQDN IPv6> - skipping A record check
Confirmed AAAA record with IP <FQDN IPv6>, but HTTP validation failed

So it is confirmed, but not added?

After commenting out ADDITIONAL_SAN line in mailcow.conf and force-renewing again, the next SAN domain is used instead of the FQDN.

strarsis commented 2 years ago

Although I use a set of domains for ADDITIONAL_SAN (no spaces, no quotes, just commas), autodiscover and autoconfig subdomains are still used.

strarsis commented 2 years ago

I have to use ./update.sh in order to make mailcow actually use the new mailconf settings.

And tiny acme skips verification of some domains, although they should be verified as they have changed.

Oh no, this is quite nightmarish for me 😫

More domains than listed in ADDTIONAL_SAN are used.

strarsis commented 2 years ago

It works now - after lots of attempts with staging Let's Encrypt and rebooting and restarting. I was really afraid that the Let's Encrypt quota may run out. I really need to find out what this issue was and how it can be prevented in the future.

That ADDTIONAL_SAN wasn't used but rather all domains and that I had to set SKIP_HTTP_VERIFICATION=y to have the FQDN used as primary domain of certificate instead of some other one (that would be ADDTIONAL_SAN) are the underlying issues.

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.