Open KellerFuchs opened 7 years ago
PS: That would be way easier if IRC had SRV
support, but if wishes were fishes, ...
Could we respond with both irc servers sorted by geoDNS? Dunno if it's available or not.
@mayli Not sure what you mean by “sorted” in that case.
You don't "sort" DNS anything. You just respond with what is needed. With DNS it's easy, because the geo-part is built in. Servers from EU only respond with European servers, US DNS only responds with US servers, simple as that. Using Route53 I'm sure allows for this?
What I mean is, why not just ask for irc.hashbang.sh, and let that entry be different depending on the geographical DNS locations. This way, a server from EU will respond faster in EU than a US server will in EU, hence only the EU entries are used by people located there.
What I mean is, why not just ask for irc.hashbang.sh, and let that entry be different depending on the geographical DNS locations. This way, a server from EU will respond faster in EU than a US server will in EU, hence only the EU entries are used by people located there.
That is what we currently do; however, we've come up with a few issues because of this, as was pointed out above. If a TLS certificate is invalid, the server must be removed from the DNS ~queries~replies. If the server isn't actually alive, it also must be removed from the DNS ~queries~replies.
@KellerFuchs by "sorted" I mean, entries in DNS respond has an sorted "answer field" eg. In most cases dns server are implemented to return them in arbitrary order to have some kind of DNS level load balance.
In the client side, it usually will try connect each entry by the order in the response. With those two combined, we could have a DNS level HA. The faster server is primary and slow server is backup.
why not just ask for irc.hashbang.sh, and let that entry be different depending on the geographical DNS locations
That was exactly what was in place.
The issue was that the healthcheck, which was there to avoid sending users to a broken server, only checked that a TCP connections could be established; of course, when TLS broke, irc.hashbang.sh
was suddenly broken for all Europe...
@mayli Except that ressource records are not ordered, or rather, quoting RFC 1034, 3.6, “the order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS”. In practice, many DNS resolvers randomize the order in a RRset, to prevent broken clients (cough Windows cough) from always hitting the “first” server.
The correct way to implement that would be SRV records (RFC 2782), but of course that's not a thing for IRC...
14:27 \
hey Habbie you work with PowerDNS right? 14:27 \ i do 14:27 \ damn that was fast 14:28 \ if I wanted to have a GeoIP-based domain with live health checks, what would be the best way to do that? 14:28 \ Could I pack in cqueues and use cqueues in a checking mechanism? 14:29 \ in the auth luabackend you mean? 14:29 \ well i'm honestly not sure how lua integrates into it, but i'd assume so yes 14:29 \ assuming this is auth, your options are 14:29 \ - luabackend 14:29 \ - pipebackend 14:29 \ - remotebackend 14:30 \ luabackend has actual Lua states inside powerdns, and absolutely nothing happens in them except when a query comes in 14:30 \ which is not where you want to do your health checks because somebody is waiting for an answer 14:30 \ pipebackend and remotebackend integrate over pipes/sockets using either a simple line-based protocol or JSON (inside HTTP depending on choices you make) 14:30 \ in which case your end can do whatever the hell it wants as long as it responds over the socket 14:30 \ hm. alrighty. 14:31 \ so PowerDNS kinda acts like a frontend and then I can use a backend to form a response in the form of a Lua server? 14:31 \ yes 14:31 \ and you have to follow a few very simple rules 14:31 \ and powerdns will get all the DNS pain exactly right for you 14:31 \ I can do a cqueues async loop where the healthcheck runs every minute and still be able to send data across the socket 14:31 \ awesome :+1: 14:31 \ yes, that sounds good 14:32 \ so is it possible to set up this backend for just one subdomain, or would it apply for all to go to this backend? 14:32 \ the best short answer is 'run a separate pdns_server for this and put dnsdist in front to route queries' 14:33 \ alrighty 14:33 \ thank you for your time 14:33 \ using multiple backends in a single pdns_server is thorny, behaviour tends to subtly change between versions, so we don't recommend it 14:33 \ no problem 14:33 \ if you have more questions further down the road, OFTC #powerdns is welcoming and is not just me :)
So, currently the best solution is:
This could also be relevant: https://gist.github.com/ahupowerdns/1e8bfbba95a277a4fac09cb3654eb2ac
FYI, using PowerDNS for GeoDNS means that we point everyone at our own DNS server, which isn't great for latency or reliability.
OTOH, AWS supports SSL healthchecks, which ought to be enough.
OTOH, AWS supports SSL healthchecks, which ought to be enough.
But it also means relying on AWS. I figured we were hopefully going for something more "independent"? Testing such things on a local system would be harder without a builtin DNS setup.
@RyanSquared In principle, I would love us to run our own DNS infra. However, that basically means relying on 3rd-party services for replicas, for reliability & latency reasons (I don't happen to have an anycast DNS network in my backpocket... yet :P) and the standard ways of doing that don't support GeoDNS (because that's not something standardized).
As far as I can tell, we can pick 2 out of 3 from:
Frankly, I would be quite OK dropping GeoDNS in favor of the first two, esp. given how limited Route53's builtin healthchecks are, but that definitely would be a longer-term project. Also, it would need to be discussed with the other admins, and I don't feel that's a discussion that belongs in this issue.
@KellerFuchs how freenode solve this problem?
By not doing GeoDNS.
@mayli Freenode, Esper, and many other servers just have a set of records that point to all their servers, independent of location. If users have an issue, it is recommended to instead set your client to a server (or to select from a list of servers) that works best for the user.
Not all servers might be listed (at the same time, or even in general) on the public interface, though. However, for our setup, it should be fine to just list them all. Plus, nothing against Freenode, but until recently their network management has been a bit clunky.
So, can we return all records as well? This seems the simple & stupid solution that works without too much effort. And we'd better use our bandwidth to focus on more important stuff, like userdb and other things.
Yes. That is the "default" way most DNS servers return multiple results for one name.
Yes, endless discussion about a thing that is currently a non-issue is indeed consuming bandwidth...
In order to close this issue - is the DNS setup in general still an issue? If we add a server are we going to have GeoIP enabled for it? If so, how should we remove this configuration?
We currently have an outage where
lon1.irc.hashbang.sh
fails all TLS handshakes. All users in Europe are only sent a record forlon1
.