Open paul-pearce opened 7 years ago
Why should this be different than the normal number of retries?
This isn't an issue with the number of retries. It's issue that we do not rotate through the roots.
e.g., for iterative, we first randomly select a .
root. If that fails, the entire process fails. Conversely, if our .
query succeeds and we receive the .com
authoritative, and the first .com
authoritative fails, we will continue to retry different .com
authoritative until timeout
.
--retries
has no impact on this, as it will try the same .
root over and over.
Out of curiosity, why do some root resolvers stop responding?
Great question. I don't know, but I observed it. I encountered this during one of my test runs when working on the recursion
branch. One of the runs had a failure rate about 7% higher than expected. Upon investigation, I discovered that one of the roots was timing out. I manually poked it and it was, indeed, not responding to that measurement machine. It may have been a rate-limiting reaction, but I doubt it. The failures were immediate during that run, and I've not observed it before or since.
It looks like the current code has a similar behavior to the old code in this regard. --retries
simply retries connecting to the same nameserver, I supposed assuming there was a transitory network issue in reaching that nameserver.
@zakird and @paul-pearce, do you think we should make this change for all levels, not just the root nameservers?
Like if a.gtld-servers.net
fails and we have retries
left, should we choose another .com
nameserver at random?
Additionally, I can imagine --retries
being:
--retries=3
means we can attempt to connect to 3 .com
NS's before giving up)--retries=3
means we can re-attempt 3 times during a domain's entire iterative lookup)I don't have strong feelings, but I think per domain is the most easily understandable as a user. LMK your thoughts.
Yeah I definitely thinking trying others at every layer is the right call. I think retries could be a max number total for a given thing that you are trying to look up. Seems easiest to understand and consistent?
On Wed, Sep 11, 2024 at 4:59 PM Phillip Stephens @.***> wrote:
It looks like the current code has a similar behavior to the old code in this regard. --retries simply retries connecting to the same nameserver, I supposed assuming there was a transitory network issue in reaching that nameserver.
@zakird https://github.com/zakird and @paul-pearce https://github.com/paul-pearce, do you think we should make this change for all levels, not just the root nameservers? Like if a.gtld-servers.net fails and we have retries left, should we choose another .com nameserver at random?
Additionally, I can imagine --retries being:
- per-NS connection (as it is now)
- per layer (--retries=3 means we can attempt to connect to 3 .com NS's before giving up)
- per domain (--retries=3 means we can re-attempt 3 times during a domain's entire iterative lookup)
I don't have strong feelings, but I think per domain is the most easily understandable as a user. LMK your thoughts.
— Reply to this email directly, view it on GitHub https://github.com/zmap/zdns/issues/93#issuecomment-2344691233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABREUAR3SYEHT6IZTKXH73ZWCVL5AVCNFSM4DFRRNQ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMZUGQ3DSMJSGMZQ . You are receiving this because you were mentioned.Message ID: @.***>
Yeah I agree, definitely easiest for the user to understand!
Right now if a root server times out in --iterative mode the query fails without trying other roots. This is because the root servers were bolted onto factory.RandomNameServer. This behavior should change, but will require a fairly large restructuring of how we handle name servers. However we fix it, we should also have --retries > 1 try other nameservers (if they exist).