rthalley / dnspython

a powerful DNS toolkit for python
http://www.dnspython.org
Other
2.44k stars 517 forks source link

SERVFAIL not handled well #22

Closed raylu closed 11 years ago

raylu commented 11 years ago

python -c 'import dns.resolver; dns.resolver.query("_domainkey.collabfinder.com", "TXT")'

This hangs because nameservers aren't removed from the list for SERVFAIL: https://github.com/rthalley/dnspython/blob/master/dns/resolver.py#L839 The comment is not very helpful in explaining why.

rthalley commented 11 years ago

On 30 Jan 2013, at 03:57, raylu notifications@github.com wrote:

python -c 'import dns.resolver; dns.resolver.query("_domainkey.collabfinder.com", "TXT")'

This hangs because nameservers aren't removed from the list for SERVFAIL: https://github.com/rthalley/dnspython/blob/master/dns/resolver.py#L839 The comment is not very helpful in explaining why.

It doesn't hang, but it will take up to the resolver's lifetime to give up (30 seconds). You can change the timeouts, e.g.

dns.resolver.get_default_resolver().timeout = 1.0 # time to wait for any given server dns.resolver.get_default_resolver().lifetime = 5.0 # total time to spend on this resolution

Unfortunately the DNS protocol doesn't differentiate between "the server temporarily cannot return an answer, try again" and "the server is broken and can't return your answer" in result codes, using SERVFAIL for both situations. Since you don't know if any given SERVFAIL is a temporary failure or a more enduring one, if you remove the server from the set you risk not getting an answer at all. Perhaps the resolver should have a policy setting saying whether SERVFAIL should be treated as an enduring failure and cause the server to be removed from the set.

/Bob

raylu commented 11 years ago

In my understanding, SERVFAIL is

Server failure - The name server was unable to process this query due to a problem with the name server.

I don't see anything in RFC 1035 about temporary failure. Am I (as often happens when dealing with RFCs) reading the wrong document?

rthalley commented 11 years ago

On 30 Jan 2013, at 20:45, raylu notifications@github.com wrote:

In my understanding, SERVFAIL is

Server failure - The name server was unable to process this query due to a problem with the name server.

I don't see anything in RFC 1035 about temporary failure. Am I (as often happens when dealing with RFCs) reading the wrong document

Unfortunately, that's about all there is about SERVFAIL in the standards. All RFC 1035's "Server failure" error promises is that "this query" couldn't be processed due to "a problem with with the name server". I doesn't say anything one way or another about why the query failed, or if the failure is transient or due to a more enduring problem, or whether a subsequent attempt at the same query is likely to fail again.

Section 5.3.3, item 4.d of RFC 1034 says

     d. if the response shows a servers failure or other
        bizarre contents, delete the server from the SLIST and
        go back to step 3.

which would seem to support removing a server from the list upon receiving SERVFAIL, at least if you're a full resolver (dnspython is a stub resolver). But even here, it's still a trade-off because as I said SERVFAIL is a catch-all. It might be that the server had a very temporary resource issue and if you just asked again you'd get the answer you wanted.

I don't know of any further general clarification of SERVFAIL in subsequent DNS RFCs.

I will ponder how to put some kind of SERVFAIL policy knob into the resolver, but in the meantime you're best bet is to lower your timeouts as I said in my prior message.

/Bob

raylu commented 11 years ago

Thanks for looking into this. We have lowered our timeouts and that has taken care of our issue for now.

host and dig both seem to respond instantly with an error.

It also seems unreasonable to expect a retry after a SERVFAIL to work in the real world. If there's an issue with the server, that's their problem (other DNS resolvers will fail immediately anyway).

rthalley commented 11 years ago

We now do not retry a SERVFAILing nameserver by default