skynetservices / skydns

DNS service discovery for etcd
MIT License
2.2k stars 307 forks source link

Golang net.LookupSRV() fails on long SkyDNS returns #235

Open swsnider opened 8 years ago

swsnider commented 8 years ago

When I run the following program to lookup SRV records in our SkyDNS instance, I get the error "lookup [REDACTED] on [REDACTED]: no such host". I've verified that loading similar records (certain domain names changed to 'example.com') on a Google Domains DNS server works just fine with the same program (and you can try it too: do a SRV request for test-srv2.sniderlabs.biz). Interestingly, dig and host both work on both the SkyDNS and Google Domains cases. I will also note that this problem goes away if the message is not truncated, not sure if there's some weird interaction there?

Code:

package main

import (
    "fmt"
    "net"
    "os"
)

const lookupName = "test-srv2.sniderlabs.biz"

func main() {
    _, srvRecords, err := net.LookupSRV("", "", lookupName)
    if err != nil {
        fmt.Printf("Unable to do SRV query for %q: %v", lookupName, err)
        os.Exit(1)
    }
    for _, srvRecord := range srvRecords {
        fmt.Printf("%v:%v\n", srvRecord.Target, srvRecord.Port)
    }
}

If there's more debugging info you need (I'm inexpert at DNS in general), let me know and I'll try to get it.

miekg commented 8 years ago

Is this using Go's dns resolver (i.e. compiled with -tags netgo) or are you using your less capable glibc implementation (standard when compiling go).

When do you get the truncated bit set? If I query this with dig I just get a response over udp.

a tcpdump of actual reply from skydns might help (can send it privately if you wish)

swsnider commented 8 years ago

This happens regardless of whether GODEBUG=netdns=cgo or GODEBUG=netdns=go is set. I'm running the code compiled under go1.5.1.

when I run host -t SRV test-srv2.sniderlabs.biz, the first line is ;; Truncated, retrying in TCP mode., which is why I was assuming the truncated bit was set. I'll see what I can do about getting a tcpdump, though that may be tricky.

miekg commented 8 years ago

[ Quoting notifications@github.com in "Re: [skydns] Golang net.LookupSRV()..." ]

This happens regardless of whether GODEBUG=netdns=cgo or GODEBUG=netdns=go is set. I'm running the code compiled under go1.5.1.

when I run host -t SRV test-srv2.sniderlabs.biz, the first line is ;; Truncated, retrying in TCP mode., which is why I was assuming the truncated bit was set. I'll see what I can do about getting a tcpdump, though that may be tricky.

Interesting. Dig uses a larger buffer by default. Seems like the std lib can't deal with truncated responses... Which seems a bit weird.

/Miek

Miek Gieben

swsnider commented 8 years ago

Actually, it deals with truncated responses fine in some cases -- it can handle the response from. Test-srv2.sniderlabs.biz just fine. There's something about the skydns response that's tripping it up. I assume it will become clearer when I'm able to do a tcpdump.

miekg commented 8 years ago

[ Quoting notifications@github.com in "Re: [skydns] Golang net.LookupSRV()..." ]

Actually, it deals with truncated responses fine in some cases -- it can handle the response from. Test-srv2.sniderlabs.biz just fine. There's something about the skydns response that's tripping it up. I assume it will become clearer when I'm able to do a tcpdump.

Yes, that would help.

/Miek

Miek Gieben

miekg commented 8 years ago

thinking some more: this response

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> test-srv2.sniderlabs.biz SRV
;; global options: +cmd
;; Got answer:
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;test-srv2.sniderlabs.biz.  IN  SRV

;; ANSWER SECTION:
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 1.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 3.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 4.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 5.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 6.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 7.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 8.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 9.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31077 10.web.responder.detection.gcs-team-awesome.st5.services.example.com.
test-srv2.sniderlabs.biz. 3599  IN  SRV 10 10 31864 2.web.responder.detection.gcs-team-awesome.st5.services.example.com.

Does not contain any IP addresses, i.e. the additional section is messing so a stub-resolver cannot do anything with this answer. Did you add addresses for the names these records point to?

miekg commented 8 years ago

but of course that should not fail the lookup. I'm going to instrument the go lib itself to see why this fails.