m13253 / dns-over-https

High performance DNS over HTTPS client & server
https://developers.google.com/speed/public-dns/docs/dns-over-https
MIT License
1.96k stars 221 forks source link

Error when trying to resolve www.netflix.com #144

Closed omgold closed 1 year ago

omgold commented 1 year ago

Since some days ago, resolving www.netflix.com through dns-over-https fails for some unknown reason. It doesn't seem to be an upstream problem and all other domains I try work as expected.

I'm running version 2.3.2 on Arch Linux.

Upstream is configured like this:

bootstrap = [
    # CloudFlare's resolver, bad ECS, good DNSSEC
    "1.1.1.1:53",
    "1.0.0.1:53",
]

When using the host command, I get this:

> host www.netflix.com
;; Got bad packet: unexpected end of input
512 bytes
00 1a 83 80 00 01 00 06 00 00 00 01 03 77 77 77          .............www
07 6e 65 74 66 6c 69 78 03 63 6f 6d 00 00 01 00          .netflix.com....
01 03 77 77 77 07 6e 65 74 66 6c 69 78 03 63 6f          ..www.netflix.co
6d 00 00 05 00 01 00 00 01 2b 00 18 03 77 77 77          m........+...www
06 64 72 61 64 69 73 07 6e 65 74 66 6c 69 78 03          .dradis.netflix.
63 6f 6d 00 03 77 77 77 06 64 72 61 64 69 73 07          com..www.dradis.
6e 65 74 66 6c 69 78 03 63 6f 6d 00 00 05 00 01          netflix.com.....
00 00 00 3b 00 2b 03 77 77 77 09 65 75 2d 77 65          ...;.+.www.eu-we
73 74 2d 31 08 69 6e 74 65 72 6e 61 6c 06 64 72          st-1.internal.dr
61 64 69 73 07 6e 65 74 66 6c 69 78 03 63 6f 6d          adis.netflix.com
00 03 77 77 77 09 65 75 2d 77 65 73 74 2d 31 08          ..www.eu-west-1.
69 6e 74 65 72 6e 61 6c 06 64 72 61 64 69 73 07          internal.dradis.
6e 65 74 66 6c 69 78 03 63 6f 6d 00 00 05 00 01          netflix.com.....
00 00 00 3b 00 4a 2c 61 70 69 70 72 6f 78 79 2d          ...;.J,apiproxy-
77 65 62 73 69 74 65 2d 6e 6c 62 2d 70 72 6f 64          website-nlb-prod
2d 33 2d 61 63 31 31 30 66 36 61 65 34 37 32 62          -3-ac110f6ae472b
38 35 61 03 65 6c 62 09 65 75 2d 77 65 73 74 2d          85a.elb.eu-west-
31 09 61 6d 61 7a 6f 6e 61 77 73 03 63 6f 6d 00          1.amazonaws.com.
2c 61 70 69 70 72 6f 78 79 2d 77 65 62 73 69 74          ,apiproxy-websit
65 2d 6e 6c 62 2d 70 72 6f 64 2d 33 2d 61 63 31          e-nlb-prod-3-ac1
31 30 66 36 61 65 34 37 32 62 38 35 61 03 65 6c          10f6ae472b85a.el
62 09 65 75 2d 77 65 73 74 2d 31 09 61 6d 61 7a          b.eu-west-1.amaz
6f 6e 61 77 73 03 63 6f 6d 00 00 01 00 01 00 00          onaws.com.......
00 3b 00 04 36 4a 49 1f 2c 61 70 69 70 72 6f 78          .;..6JI.,apiprox
79 2d 77 65 62 73 69 74 65 2d 6e 6c 62 2d 70 72          y-website-nlb-pr
6f 64 2d 33 2d 61 63 31 31 30 66 36 61 65 34 37          od-3-ac110f6ae47
32 62 38 35 61 03 65 6c 62 09 65 75 2d 77 65 73          2b85a.elb.eu-wes
74 2d 31 09 61 6d 61 7a 6f 6e 61 77 73 03 63 6f          t-1.amazonaws.co
6d 00 00 01 00 01 00 00 00 3b 00 04 03 fb 32 95          m........;....2.
2c 61 70 69 70 72 6f 78 79 2d 77 65 62 73 69 74          ,apiproxy-websit
65 2d 6e 6c 62 2d 70 72 6f 64 2d 33 2d 61 63 31          e-nlb-prod-3-ac1
31 30 66 36 61 65 34 37 32 62 38 35 61 03 65 6c          10f6ae472b85a.el

But everything seems fine when asking upstream directly:

> host www.netflix.com 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

www.netflix.com is an alias for www.dradis.netflix.com.
www.dradis.netflix.com is an alias for www.eu-west-1.internal.dradis.netflix.com.
www.eu-west-1.internal.dradis.netflix.com is an alias for apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com.
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has address 18.200.8.190
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has address 54.155.246.232
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has address 54.73.148.110
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has IPv6 address 2a05:d018:76c:b683:e1fe:9fbf:c403:57f1
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has IPv6 address 2a05:d018:76c:b684:b233:ac1f:be1f:7
apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com has IPv6 address 2a05:d018:76c:b685:c898:aa3a:42c7:9d21

The only unusual thing I see about Netflix is the rather long list of results. Could imagine there is a limit on message size in dns-over-https which is exceeded because of that.

m13253 commented 1 year ago

I remember I have written the logic to detect long packets or truncated packets. Maybe it didn't work for some reason…

maxbraeutigam commented 1 year ago

Hi @m13253 – I can confirm the bug in ArchLinux version community/dns-over-https 2.3.2-1

> host asana-user-private-us-east-1.s3.amazonaws.com
Host asana-user-private-us-east-1.s3.amazonaws.com not found: 2(SERVFAIL)
> host asana-user-private-us-east-1.s3.amazonaws.com 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

asana-user-private-us-east-1.s3.amazonaws.com is an alias for s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com is an alias for s3-w.us-east-1.amazonaws.com.
s3-w.us-east-1.amazonaws.com has address 3.5.21.101
s3-w.us-east-1.amazonaws.com has address 52.216.110.35
s3-w.us-east-1.amazonaws.com has address 3.5.11.228
s3-w.us-east-1.amazonaws.com has address 52.217.133.193
s3-w.us-east-1.amazonaws.com has address 52.217.84.52
s3-w.us-east-1.amazonaws.com has address 52.217.134.57
s3-w.us-east-1.amazonaws.com has address 54.231.197.217
s3-w.us-east-1.amazonaws.com has address 3.5.10.23
maxbraeutigam commented 1 year ago

Same error on latest commit 70fc857

m13253 commented 1 year ago

Thanks for the reports. Will spend some time investigating it.

satishweb commented 1 year ago

Do we need a new release for this bug fix?

GreenYun commented 1 year ago

I used dig with dns-over-https and it returned the answers correctly, while host complained for bad packets (same as the issue).

I think these were something happening that resulted UDP packet chunked with the connection between host and doh-client.

m13253 commented 1 year ago

Do we need a new release for this bug fix?

If it get fixed, definitely we need to bump the version number. I am currently trying to reproduce this bug along with @GreenYun.

I guess the problem is that, host doesn’t support large UDP packets, while doh-client mistakenly thought host does support it (probably due to a mistake in parsing EDNS data). Meanwhile, most modern DNS resolvers does support large UDP packets. For host, you can use -T to temporarily force TCP.

m13253 commented 1 year ago

I think this logic is correct… Not sure why it doesn’t work. https://github.com/m13253/dns-over-https/blob/70fc8578c7acd39b76f9ef074290c582f6d446c9/doh-client/ietf.go#L241-L249

m13253 commented 1 year ago

Please test the newer version fdc1b81e4224dbed8fd7372f79679de98504ecec and let me know if it fixes the problem.

GreenYun commented 1 year ago

Do we need a new release for this bug fix?

If it get fixed, definitely we need to bump the version number. I am currently trying to reproduce this bug along with @GreenYun.

I guess the problem is that, host doesn’t support large UDP packets, while doh-client mistakenly thought host does support it (probably due to a mistake in parsing EDNS data). Meanwhile, most modern DNS resolvers does support large UDP packets. For host, you can use -T to temporarily force TCP.

It is correct that host does not send OPT part to announce a maximum UDP packet data size, while dig does. That means host only accepts a 512-byte datagram of the response.

The source code of host shows that it parses the datagram first then check the TC bit for redoing the lookup via TCP. The truncated diagram may lead an error of malform data, as the following shows:

https://github.com/isc-projects/bind9/blob/d8f98cec4857babd9250d2270f6432e100eebf51/bin/dig/dighost.c#L4165-L4176

After the processing, host checks the TC bit and requeue a further lookup:

https://github.com/isc-projects/bind9/blob/d8f98cec4857babd9250d2270f6432e100eebf51/bin/dig/dighost.c#L4272-L4285

maxbraeutigam commented 1 year ago

Hi @m13253 - At the first glance it looks good. Thank you very much for the quick fix, it’s highly appreciated since I love the doh client. Tomorrow, I am gonna test it at home where I first stumbled across the error, but at work I have an almost identical setup.

Hi @omgold - can you confirm this?

Hi @GreenYun - thanks for checking host. In my case it is not about host itself, I was not able to open Netflix and some more in Firefox or Chromium.

omgold commented 1 year ago

Yes. for me the fix works also.

GreenYun commented 1 year ago

Hi @m13253 - At the first glance it looks good. Thank you very much for the quick fix, it’s highly appreciated since I love the doh client. Tomorrow, I am gonna test it at home where I first stumbled across the error, but at work I have an almost identical setup.

Hi @omgold - can you confirm this?

Hi @GreenYun - thanks for checking host. In my case it is not about host itself, I was not able to open Netflix and some more in Firefox or Chromium.

Checking host helped us to know how the TC bit work and found the problems. We have mistaken something before.

m13253 commented 1 year ago

Do we need a new release for this bug fix?

I published the v2.3.3 release to include this fix. This fix solved a bug so I want to push it to downstream sooner.

satishweb commented 1 year ago

I will generate new container image tonight

satishweb commented 1 year ago

v2.3.3 container image released. Local tests passed.

m13253 commented 1 year ago

v2.3.3 container image released. Local tests passed.

Big thanks!