Open chris001 opened 4 years ago
Interesting, that's a case we hadn't considered - currently Virtualmin does indeed assume that it manages certs and thus knows when TLSA records need to be updated.
What if we provided a script that would re-sync TLSA records which you could call after manual updates, or even setup a cron job to run?
What if we provided a script that would re-sync TLSA records which you could call after manual updates, or even setup a cron job to run?
A script to update the TLSA
resource records would be great. It'd need to:
TLSA
resource records, and insert them into the relevant BIND9 DNS zone 2xTTL of the TLSA resource records before the old (LetsEncrypt) certs expire, Reference: From slide 25, June 2019 ICANN65 presentation on DANE/TLSA :
Rolling Your TLS Keys
TTL
ahead)_25._tcp.mx.example.com. IN TLSA 3 1 1 **_curr-pubkey-sha256_**
_25._tcp.mx.example.com. IN TLSA 3 1 1 **_next-pubkey-sha256_**
Most of that we can do easily, except for the rolling update to the Let's Encrypt cert. This is kind of complex for any cert actually, as Virtualmin would need to publish the new cert in DNS for long enough to allow caches to expire before actually switching to it in Apache.
I wonder, can this problem instead be minimized by having a really short TTL on the TLSA records?
I wonder, can this problem instead be minimized by having a really short TTL on the TLSA records?
3600
(seconds = 1 hour) is a decent short TTL
to have on the TLSA
records.
Then, you only need to insert the new TLSA
records, for the new (LetsEncrypt) TLS
certs, 2 hours before the old (LetsEncrypt) TLS
certs expire.
When the old (LetsEncrypt) TLS
certs do expire, you can remove/prune the old TLSA
records from DNS
.
What if the TTL was only 60 seconds?
What if the TTL was only 60 seconds?
Good question. I think you'd experience rather heavy DNS
server load, because every time any client wanted to use a secure web service running on the virtualmin server, their validation of the security of the TLS cert served by virtualmin, would only last 60 seconds, meaning they could pretty much never cache it, so for example, a web or email client would have to re-validate the TLSA
hash matches the hash of the cert for every single web page that a user browsed (assuming the user spends more than a minute reading each web page), or for every time the mail app checked for new email (assuming the user's mail client checks for mail every 2+ minutes).
https://tools.ietf.org/html/rfc6698
A.4. Handling Certificate Rollover
Certificate rollover is handled in much the same way as for rolling
DNSSEC zone signing keys using the pre-publish key rollover method
[RFC4641]. Suppose example.com
has a single TLSA record for a TLS
service on TCP port 990:
_990._tcp.example.com IN TLSA 1 1 1 1CFC98A706BCF3683015...
To start the rollover process, obtain or generate the new certificate or SubjectPublicKeyInfo to be used after the rollover and generate the new TLSA record. Add that record alongside the old one:
_990._tcp.example.com IN TLSA 1 1 1 1CFC98A706BCF3683015...
_990._tcp.example.com IN TLSA 1 1 1 62D5414CD1CC657E3D30...
After the new records have propagated to the authoritative nameservers and the TTL of the old record has expired, switch to the new certificate on the TLS server. Once this has occurred, the old TLSA record can be removed:
_990._tcp.example.com IN TLSA 1 1 1 62D5414CD1CC657E3D30...
This completes the certificate rollover.
Got it - I see what needs to be done, it's just complex to implement given the way SSL cert renewals work currently.
Got it - I see what needs to be done, it's just complex to implement given the way SSL cert renewals work currently.
Yes, it seems extra metadata, such as creation date/time, and/or expiration date/time, might need to be kept, so virtualmin will know with certainty which records are pruning candidates. Maybe that extra metadata could be stored cleverly in the comment of the relevant DNS resource record? The old TLSA
records don't need to be pruned exactly on time, it doesn't hurt anything if they linger around longer than strictly needed, however, it's good to have the system management software be smart enough to know IF a TLSA
record is old/eligible for pruning, so that WHEN the pruning script runs again, it can intelligently prune all the old records, and make the DNS zone clean of old garbage, and perfectly valid with a 100% score on the DNS zone tester tool sites.
The complexity comes from the SSL cert replacement process - instead of just immediately starting to use it, Virtualmin would need to keep it separately until all cached records have expired, and then apply the cert in the background.
The complexity comes from the SSL cert replacement process - instead of just immediately starting to use it, Virtualmin would need to keep it separately until all cached records have expired, and then apply the cert in the background.
Couldn't you request the new TLS
certs (from LetsEncrypt), then, as soon as certbot
successfully obtains them, immediately make the call to ldns-dane
to generate the new TLSA
, install it into the DNS zone, then reload or restart bind9
and the relevant secure web service? There'd be a second or two when the new TLSA
record doesn't exist in the zone yet and new secure connections fail to validate the new TLS
cert against the old TLSA
hash of the secure web service, but it'd be the best we could do, wouldn't it?
But what about clients using the old cached DNS records who connect to the server and see the new cert, which doesn't match them?
But what about clients using the old cached DNS records who connect to the server and see the new cert, which doesn't match them?
WEB : If the https
web browser client is a strict DANE-validating client, one that refuses to connect to secure web services with invalid or possibly forged TLSA hashes, then it may (and probably should) display an error to the user and refuse to connect to it. Retries will succeed after the TTL of the old expired TLSA records has passed, after which time they'll have aged out of the client's local DNS cache, and the DNS client must request the new TLSA records from the zone's authoritative DNS server.
In practice, none of the mainstream secure web browsers - Apple Safari, Firefox, Chrome, ChrEdge, Opera, Linux IceWeasel, etc., strictly validate and enforce the fact that DANE TLSA record hashes must match the hash of the TLS certificate presented to the client by the web server. The browsers just do "classic" Root CA TLS cert verification.
MAIL : Mail servers have more tools, in the form of plugins, for enforcing the rule that TLSA records must match the hash of the TLS cert being served live on the secure mail server port. Mail app users are more tolerant of waiting a TTL of time, as long as it's short, for example, 15 minutes, for the next secure mail checking TLS session, because their secure mail checking TLS sessions are usually 10x shorter than a 15 minute TTL, and occur in the background, so user experience can't be, and doesn't have to be, instant.
@jcameron @swelljoe
This Firefox browser addon (it might work on Chrome also), should help you more quickly troubleshoot whether the code is creating valid DNSSEC
, and DANE/TLSA
, DNS resource records.
It lets you refresh the web page, click on its orange padlock icon, and instantly see whether the DNSSEC and DANE/TLSA statuses be valid (green) or invalid (red).
https://addons.mozilla.org/en-US/firefox/addon/httpspluschecker/
So in that case, is there any benefit in keeping the DNS records for old cert around? It seems like it's only useful to have multiple if they are created before the new cert is installed.
So in that case, is there any benefit in keeping the DNS records for old cert around? It seems like it's only useful to have multiple if they are created before the new cert is installed.
It'd seem to be the case. The instant the web services (dovecot, postfix, apache, nginx, openldap, etc) start serving the new TLS cert, you can safely delete the DANE TLSA resource records containing the hash of the old TLS cert (and resign the zone), and nobody should encounter an error, because nobody should look for the old version of those DANE TLSA records, since the hash they show would fail to match the hash of the new TLS cert.
Update. Virtualmin is still failing to update/insert the new validTLSA
aka TLS Authentication
records into DNS
. Note: virtualmin
absolutely must be robust enough to not assume that the only time a Let's Encrypt cert is renewed, is when Virtualmin renews it. This server admin, and probably many others, is renewing its own letsencrypt certs by cron
job, because the Virtualmin software refuses to renew them, because they came into being on the server before Virtualmin added the LetsEncrypt feature...
@chris001 if you are updating cert files outside of Virtualmin, you can force a re-sync of the TLSA records by running :
virtualmin modify-web --domain example.com --sync-tlsa
this can be run for all domains with :
virtualmin modify-web --all-domains --sync-tlsa
However, I'd recommend running it only for the renewed domain if you can.
Having a hard time testing it! After running the command line, the TLSA record test site https://www.huque.com/bin/danecheck gives a White Screen Of Death on the virtualmin-hosted domains! Haven't checked deeper into it yet, just thought I'd share this result!
Do the updated TLSA records look OK?
The command line gives an error:
Virtual server myvirtualdomain.com does not have a web site enabled
When checking under Virtualmin, myvirtualserver.com
, Edit Virtual Server
, Enabled Features
, both Nginx website enabled
and Nginx SSL website
are OFF
(unchecked). Had to disable in order to manually get PHP-FPM working on Nginx because the config generator had been breaking the Nginx PHP-FPM config for all the virtual server nginx websites.
Oh ... so this domain has TLSA records, but not a website (according to Virtualmin)? That's not a setup we support currently, sorry.
Received a nice automated email today from Viktor Dukhovni (ietf-dane@dukhovni.org). He runs a tool that monitors domains which use TLSA records. And sends a notification email, when it detects botched key rotations, causing invalid broken TLSA records, and therefore, blockage to web services because the key/hash fails to match what is actually being served by the server! More info: https://www.isi.edu/~hardaker/presentations/2019-06-DANE-hardaker-dukhovni.pdf Basically, Virtualmin/Webmin is assuming (wrongfully) that it and only it is allowed to update/renew LetsEncrypt certs, and the TLSA hash generation is (wrongfully) dependent on that assumption. What we have is, outdated TLSA resource record hashes, that Webmin/Virtualmin (wrongfully) allows those invalid outdated hashes to sit there in BIND9 DNS, and serve wrong hashes to the world, which blocks access to the various web services running on the Webmin/Virtualmin server! The TLSA hash generation (and insertion/update into BIND9 DNS) must be updated to be independent. As soon as Webmin/Virtualmin detects a new, valid, cert is encrypting traffic on any of the server ports, it needs to update the TLSA DNS record so that it'll contain the correct hash of that new cert, so that BIND9 can then serve the correct TLSA record to the world. Regardless of whether it's Webmin/Virtualmin that invokes the cert creation/renewal, or an external script that invoked the cert creation/renewal. See also similar issues: #115 and #108 !