mail-in-a-box / mailinabox

Mail-in-a-Box helps individuals take back control of their email by defining a one-click, easy-to-deploy SMTP+everything else server: a mail server in a box.
https://mailinabox.email/
Creative Commons Zero v1.0 Universal
14.03k stars 1.44k forks source link

Invalid custom DNS causes unresolvable domain #1242

Open caspermeijn opened 7 years ago

caspermeijn commented 7 years ago

It appear that an invalid custom DNS entry can cause the DNS server to provide empty responses, which leads to unresolvable domains.

This issue came to my attention because I recieved the following status change:

✖ Nameserver glue records are incorrect. The ns1.rivet.meijn.net and ns2.rivet.meijn.net nameservers must be configured at your domain name registrar as having the IP address 149.210.242.211. They currently report addresses of [Not Set]/[Not Set]. It may take several hours for public DNS to update after a change. ✖ This domain must resolve to your box's IP address (149.210.242.211 / 2a01:7c8:aabb:335::1) in public DNS but it currently resolves to [Not Set] / [Not Set]. It may take several hours for public DNS to update after a change. This problem may result from other issues listed above.

I first blamed by DNS provider for incorrect glue records, but in hindsight these were correct. I found that the DNS server didn't respond. Running from my mail-in-a-box server:

casper@rivet:~$ dig meijn.net @localhost

; <<>> DiG 9.9.5-3ubuntu0.15-Ubuntu <<>> meijn.net @localhost ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 37477 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;meijn.net. IN A

;; Query time: 3 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Wed Sep 13 09:00:44 CEST 2017 ;; MSG SIZE rcvd: 38

When I ran sudo mailinabox it showed the following error message:

Command '['/usr/bin/ldns-signzone', '-e', '20171013', '-n', '/etc/nsd/zones/meijn.net.txt', '/tmp/Kmeijn.net.+007+59703', '/tmp/Kmeijn.net.+007+03578']' returned non-zero exit status 1

Which made me think that an DNS entry must be invalid.

After removing all custom DNS entries the same dig command returned addresses as expected.

Reproducing the problem

To reproduce the problem you need to an invalid custom domain and wait for a while. The DNS server problem only appeared after a week for me.

I added a custom domain entry with type SSHFP and value SSHFP 1 1 e731638dfdbd6a50755e6390fffd7883f892d313 This causes an error message to appear, but the entry is added. This gave me the impressing that all was well, but I think an internal process failed due to the invalid custom domain. The error message is:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.</p>

I hope that you can use this information to improve this wonderful software.

JoshData commented 7 years ago

Agreed - we need better validation of custom DNS records before they get saved. We validate for A records that valid IP addresses are given, for instance, but we don't have validation for all record types. And we don't check that nsd accepts the records without dying.

scottnzuk commented 6 years ago

I also have this issue, How do I fix?

After removing all custom DNS entries the same dig command returned addresses as expected

caspermeijn commented 6 years ago

@scottnzuk For me the problem was a SSHFP record which was incorrectly formatted. So you need to find the custom DNS entry in your config that is malformed and remove that entry. I believe that it will self-recover after the incorrect record has been removed.

ringe commented 3 years ago

I have this issue and I cannot locate the incorrect record. I am hosting too many domains to spot it manually.

So how can I tell the loop in /root/mailinabox/management/daemon.py on line 317 to spit out which domain it's failing on?

As in log it to syslog?

ringe commented 3 years ago

@JoshData app.logger didn't fly

ringe commented 3 years ago

@JoshData This problem comes from having lost users. There custom DNS entrires lost the "zone" key since there's no zone they belong to anymore.

So to fix this, I have to add the missing domain (add a user at that domain), delete the custom dns, then remove the user again.