microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.46k stars 822 forks source link

WSL Internet Connection Sharing DNS resolver does not adhere to 512 byte UDP limit #7642

Closed jboelter closed 9 months ago

jboelter commented 3 years ago

Version

Microsoft Windows [Version 10.0.22000.282]

WSL Version

Kernel Version

5.10.60.1

Distro Version

Ubuntu 20.04

Other Software

Go programs using the Go dns resolver.

Repro Steps

In a WSL2 terminal, capture DNS UDP traces. These can also be saved to disk and viewed in wireshark later. You could also use the very old (but still working) Microsoft Network Monitor 3.4.

This isn't a new issue as I've observed it intermittently for a year+. I just finally dug into it and (I think) root caused it. I've been having intermittent issues with certain Go programs failing with "cannot unmarshal DNS message". This appears related to the ICS dns resolver sending > 512 byte packets. The Go language strictly adheres to the 512 byte packet limit when using the netgo dns resolver.

Edit: Go versions 1.18.0+, 1.17.8+ or 1.16.15+ now accept a packet size of 1232 bytes.

Also note that I repro'd this on a non-corporate machine (no VPNs or firewalls) and verified with TCPView that the only dns listener on udp port 53 is the ICS SharedAccess service.

Capture DNS packet traces for later analysis; add "-w out.pcap" to capture a file.

sudo tcpdump -vv -X -i eth0 -s 65535 udp port 53 or tcp port 53

Repro -> Run a dig command without the edns option (DNS 512 byte packet limit should be observed)

dig partneranalyticsapim.microsoft.com +noedns

Run a dig command with the edns option (large packets are expected; not a bug)

dig partneranalyticsapim.microsoft.com +edns

Per https://docs.microsoft.com/en-us/windows/wsl/troubleshooting Internet Connection Sharing (ICS) is a required component of WSL 2. It is the default DNS resolver for an out-of-the-box setup.

Expected Behavior

The dig utility doesn't fail, but an equivalent Go program which adheres to the RFC will fail. The ICS DNS resolver should adhere to DNS RFC spec when requests do not support large UDP packets (e.g. no EDNS option).

This particular host name happens to exhibit the issue given how long the CNAMEs are and the lack of message compression.

Actual Behavior

Failing example. Note the lack of an EDNS option, yet the packet returned is greater than the allowable size. This interacts w/ the Go resolver that truncates the packet to 512 bytes.

dig partneranalyticsapim.microsoft.com +noedns

; <<>> DiG 9.16.1-Ubuntu <<>> partneranalyticsapim.microsoft.com +noedns
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1549
;; flags: qr rd ad; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;partneranalyticsapim.microsoft.com. IN A

;; ANSWER SECTION:
partneranalyticsapim.microsoft.com. 0 IN CNAME  partneranalyticsapim.azure-api.net.
partneranalyticsapim.azure-api.net. 0 IN CNAME  apimgmttm6kf9f0xeqwbyzauxok7as6eh1rsywppzkj6wcnk9t.trafficmanager.net.
apimgmttm6kf9f0xeqwbyzauxok7as6eh1rsywppzkj6wcnk9t.trafficmanager.net. 0 IN CNAME partneranalyticsapim-westus-01.regional.azure-api.net.
partneranalyticsapim-westus-01.regional.azure-api.net. 0 IN CNAME apimgmths0xmu06lunwqwrty2xa1j32hyqfcbxllcikzoxr9hv.cloudapp.net.
apimgmths0xmu06lunwqwrty2xa1j32hyqfcbxllcikzoxr9hv.cloudapp.net. 0 IN A 40.78.17.247

;; Query time: 30 msec
;; SERVER: 172.27.224.1#53(172.27.224.1)
;; WHEN: Thu Nov 04 21:24:53 PDT 2021
;; MSG SIZE  rcvd: 596

21:24:53.238314 IP (tos 0x0, ttl 64, id 64293, offset 0, flags [none], proto UDP (17), length 80)
    172.27.235.46.56793 > BUNKER.mshome.net.domain: [bad udp cksum 0x23b5 -> 0xbc5c!] 1549+ A? partneranalyticsapim.microsoft.com. (52)
        0x0000:  4500 0050 fb25 0000 4011 5c10 ac1b eb2e  E..P.%..@.\.....
        0x0010:  ac1b e001 ddd9 0035 003c 23b5 060d 0120  .......5.<#.....
        0x0020:  0001 0000 0000 0000 1470 6172 746e 6572  .........partner
        0x0030:  616e 616c 7974 6963 7361 7069 6d09 6d69  analyticsapim.mi
        0x0040:  6372 6f73 6f66 7403 636f 6d00 0001 0001  crosoft.com.....
21:24:53.272440 IP (tos 0x0, ttl 128, id 2027, offset 0, flags [none], proto UDP (17), length 624)
    BUNKER.mshome.net.domain > 172.27.235.46.56793: [udp sum ok] 1549-$ q: A? partneranalyticsapim.microsoft.com. 5/0/0 partneranalyticsapim.microsoft.com. CNAME partneranalyticsapim.azure-api.net., partneranalyticsapim.azure-api.net. CNAME apimgmttm6kf9f0xeqwbyzauxok7as6eh1rsywppzkj6wcnk9t.trafficmanager.net., apimgmttm6kf9f0xeqwbyzauxok7as6eh1rsywppzkj6wcnk9t.trafficmanager.net. CNAME partneranalyticsapim-westus-01.regional.azure-api.net., partneranalyticsapim-westus-01.regional.azure-api.net. CNAME apimgmths0xmu06lunwqwrty2xa1j32hyqfcbxllcikzoxr9hv.cloudapp.net., apimgmths0xmu06lunwqwrty2xa1j32hyqfcbxllcikzoxr9hv.cloudapp.net. A 40.78.17.247 (596)
        0x0000:  4500 0270 07eb 0000 8011 0d2b ac1b e001  E..p.......+....
        0x0010:  ac1b eb2e 0035 ddd9 025c 51cf 060d 8120  .....5...\Q.....
        0x0020:  0001 0005 0000 0000 1470 6172 746e 6572  .........partner
        0x0030:  616e 616c 7974 6963 7361 7069 6d09 6d69  analyticsapim.mi
        0x0040:  6372 6f73 6f66 7403 636f 6d00 0001 0001  crosoft.com.....
        0x0050:  1470 6172 746e 6572 616e 616c 7974 6963  .partneranalytic
        0x0060:  7361 7069 6d09 6d69 6372 6f73 6f66 7403  sapim.microsoft.
        0x0070:  636f 6d00 0005 0001 0000 0000 0024 1470  com..........$.p
        0x0080:  6172 746e 6572 616e 616c 7974 6963 7361  artneranalyticsa
        0x0090:  7069 6d09 617a 7572 652d 6170 6903 6e65  pim.azure-api.ne
        0x00a0:  7400 1470 6172 746e 6572 616e 616c 7974  t..partneranalyt
        0x00b0:  6963 7361 7069 6d09 617a 7572 652d 6170  icsapim.azure-ap
        0x00c0:  6903 6e65 7400 0005 0001 0000 0000 0047  i.net..........G
        0x00d0:  3261 7069 6d67 6d74 746d 366b 6639 6630  2apimgmttm6kf9f0
        0x00e0:  7865 7177 6279 7a61 7578 6f6b 3761 7336  xeqwbyzauxok7as6
        0x00f0:  6568 3172 7379 7770 707a 6b6a 3677 636e  eh1rsywppzkj6wcn
        0x0100:  6b39 740e 7472 6166 6669 636d 616e 6167  k9t.trafficmanag
        0x0110:  6572 036e 6574 0032 6170 696d 676d 7474  er.net.2apimgmtt
        0x0120:  6d36 6b66 3966 3078 6571 7762 797a 6175  m6kf9f0xeqwbyzau
        0x0130:  786f 6b37 6173 3665 6831 7273 7977 7070  xok7as6eh1rsywpp
        0x0140:  7a6b 6a36 7763 6e6b 3974 0e74 7261 6666  zkj6wcnk9t.traff
        0x0150:  6963 6d61 6e61 6765 7203 6e65 7400 0005  icmanager.net...
        0x0160:  0001 0000 0000 0037 1e70 6172 746e 6572  .......7.partner
        0x0170:  616e 616c 7974 6963 7361 7069 6d2d 7765  analyticsapim-we
        0x0180:  7374 7573 2d30 3108 7265 6769 6f6e 616c  stus-01.regional
        0x0190:  0961 7a75 7265 2d61 7069 036e 6574 001e  .azure-api.net..
        0x01a0:  7061 7274 6e65 7261 6e61 6c79 7469 6373  partneranalytics
        0x01b0:  6170 696d 2d77 6573 7475 732d 3031 0872  apim-westus-01.r
        0x01c0:  6567 696f 6e61 6c09 617a 7572 652d 6170  egional.azure-ap
        0x01d0:  6903 6e65 7400 0005 0001 0000 0000 0041  i.net..........A
        0x01e0:  3261 7069 6d67 6d74 6873 3078 6d75 3036  2apimgmths0xmu06
        0x01f0:  6c75 6e77 7177 7274 7932 7861 316a 3332  lunwqwrty2xa1j32
        0x0200:  6879 7166 6362 786c 6c63 696b 7a6f 7872  hyqfcbxllcikzoxr
        0x0210:  3968 7608 636c 6f75 6461 7070 036e 6574  9hv.cloudapp.net
        0x0220:  0032 6170 696d 676d 7468 7330 786d 7530  .2apimgmths0xmu0
        0x0230:  366c 756e 7771 7772 7479 3278 6131 6a33  6lunwqwrty2xa1j3
        0x0240:  3268 7971 6663 6278 6c6c 6369 6b7a 6f78  2hyqfcbxllcikzox
        0x0250:  7239 6876 0863 6c6f 7564 6170 7003 6e65  r9hv.cloudapp.ne
        0x0260:  7400 0001 0001 0000 0000 0004 284e 11f7  t...........(N..

Diagnostic Logs

PCAP File w/ the two 'dig' examples https://1drv.ms/u/s!At9UHnuKzNw8rnz4ZsJekezwP64p?e=TNHJ4t

Gist of a minimal Go program that will also repro: https://gist.github.com/jboelter/3dfbc449c873186bc26488a8600cac74

jboelter commented 3 years ago

Likely related:

https://github.com/golang/go/issues/44135 https://github.com/rclone/rclone/issues/4984 https://github.com/microsoft/ApplicationInsights-Go/issues/56

For Go developers on WSL2 stumbling on this bug from elsewhere; you need to force CGO for dns resolution. (e.g. GODEBUG=netdns=cgo or override the resolver -- see the gist above )

Edit: A mitigation to grow the default packet buffer size from 512 bytes to 1232 bytes introduced in Go 1.18 and backported 1.17 and 1.16. Recommend using 1.18.0+, 1.17.8+ or 1.16.15+ to compile your Go programs that are impacted by this issue.

josesa-xx commented 3 years ago

I'm getting this error when doing docker/podman pull from corporate Artifactory, getting DNS response bigger than 512 bytes (in my case 678 bytes).

Running wsl-vnpkit (https://github.com/sakai135/wsl-vpnkit), lowers the DNS response size to 197, overcoming the problem with DNS issue and allowing image pull to work again.

DaveOHenry commented 2 years ago

Our Terraform deployments suddenly fail with "cannot unmarshal DNS message" for calls to "management.azure.com". Bypassing the ICS resolver works for now, but has other negative side effects in certain cases: https://gist.github.com/coltenkrauter/608cfe02319ce60facd76373249b8ca6

jboelter commented 2 years ago

A mitigation to grow the default packet buffer size from 512 bytes to 1232 bytes introduced in Go 1.18 and backported 1.17 and 1.16. Recommend using 1.18.0+, 1.17.8+ or 1.16.15+ to compile your Go programs that are impacted by this issue.

See also https://github.com/golang/go/issues/51127

microsoft-github-policy-service[bot] commented 9 months ago

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!