zfl9 / chinadns-ng

chinadns 重构增强版,支持域名分流、ipset/nftset、UDP/TCP/DoT
GNU Affero General Public License v3.0
1.13k stars 188 forks source link

[增强] 支持 `udp://` 上游 #160

Closed windmsn closed 7 months ago

windmsn commented 7 months ago

拓扑如下: 国内:局域网设备->dnsmasq->chinadns-ng->211.136.192.6/120.196.165.24(运营商DNS) 国外:局域网设备->dnsmasq->chinadns-ng->dns2tcp->8.8.8.8 dnsmasq有ipv6需求,不能扔 dns2tcp只能监听udp,使用chinadns-ng自带的tcp://8.8.8.8会出现大量的connection reset by peer.所以只能用dns2tcp

Wed Apr 10 21:36:27 2024 kern.warn kernel: [16615901.916000] connection reset by peer.
Wed Apr 10 21:42:10 2024 kern.warn kernel: [16616244.456000] connection reset by peer.
Wed Apr 10 21:42:10 2024 kern.warn kernel: [16616244.476000] connection reset by peer.
Wed Apr 10 21:42:11 2024 kern.warn kernel: [16616245.500000] connection reset by peer.
Wed Apr 10 21:42:11 2024 kern.warn kernel: [16616245.516000] connection reset by peer.
Wed Apr 10 21:42:13 2024 kern.warn kernel: [16616247.524000] connection reset by peer.
Wed Apr 10 21:42:42 2024 kern.warn kernel: [16616276.436000] connection reset by peer.
Wed Apr 10 21:42:42 2024 kern.warn kernel: [16616276.460000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.492000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.496000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.500000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.508000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.512000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.640000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.644000] connection reset by peer.

今天使用时出现一个情况,手机在刷抖音的时候。app会调起大量的tcp的dns查询,dnsmasq监听53端口,接收到请求后转发到chinadns-ng,但chinadns-ng上游全都只支持udp导致dnsmasq在等待结果时启动了多个进程。

[root@ManTou:/root]#netstat -anlp | grep dns
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN      11779/dnsmasq
tcp        0      0 127.0.0.1:60740         127.0.0.1:5354          ESTABLISHED 11781/dnsmasq
tcp        0      0 127.0.0.1:48100         127.0.0.1:5354          ESTABLISHED 11783/dnsmasq
tcp        0      0 127.0.0.1:42354         127.0.0.1:5354          ESTABLISHED 11784/dnsmasq
tcp        0      0 127.0.0.1:51948         127.0.0.1:5354          ESTABLISHED 11779/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51276         ESTABLISHED 11784/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51275         ESTABLISHED 11783/dnsmasq
tcp        0      0 127.0.0.1:50012         127.0.0.1:5354          ESTABLISHED 11782/dnsmasq
tcp        0      0 127.0.0.1:54257         127.0.0.1:5354          ESTABLISHED 11785/dnsmasq
tcp        0      0 127.0.0.1:60232         127.0.0.1:5354          ESTABLISHED 11780/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51277         ESTABLISHED 11785/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51278         ESTABLISHED 11786/dnsmasq
tcp        0      0 127.0.0.1:49734         127.0.0.1:5354          ESTABLISHED 11786/dnsmasq
tcp        0      0 :::5354                 :::*                    LISTEN      26727/chinadns-ng
tcp        0      0 :::53                   :::*                    LISTEN      11779/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:48100  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49162 ESTABLISHED 11782/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:60232  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49159 ESTABLISHED 11780/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:50012  ESTABLISHED 26727/chinadns-ng
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:49734  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49160 ESTABLISHED 11779/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:51948  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49161 ESTABLISHED 11781/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:60740  ESTABLISHED 26727/chinadns-ng
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:42354  ESTABLISHED 26727/chinadns-ng
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:54257  ESTABLISHED 26727/chinadns-ng
udp        0      0 0.0.0.0:44167           0.0.0.0:*                           26727/chinadns-ng
udp        0      0 0.0.0.0:39137           0.0.0.0:*                           26727/chinadns-ng
udp        0      0 0.0.0.0:53              0.0.0.0:*                           11779/dnsmasq
udp        0      0 0.0.0.0:67              0.0.0.0:*                           11779/dnsmasq
udp        0      0 :::5354                 :::*                                26727/chinadns-ng
udp        0      0 :::53                   :::*                                11779/dnsmasq
unix  2      [ ]         DGRAM                    30882539 11779/dnsmasq       

运行时配置如下

2024-04-10 16:07:02 I [main.zig:117 main] local listen addr: ::#5354@tcp+udp
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: tcpin://211.136.192.6
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: udpin://211.136.192.6
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: tcpin://120.196.165.24
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: udpin://120.196.165.24
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: tcpin://127.0.0.1#5353
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: udpin://127.0.0.1#5353
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: tcpin://127.0.0.1#5352
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: udpin://127.0.0.1#5352

当客户端使用以下命令查询时 dig @127.0.0.1 -p5354 +tcp www.youtube.com chinadns-ng报以下错误

2024-04-10 16:07:11 I [server.zig:203 service_tcp] new connection:7 from ::ffff:10.0.0.21#62838
2024-04-10 16:07:11 I [server.zig:302 QueryLog.query] query(id:39920, tag:gfw, qtype:28, 'www.youtube.com') from ::ffff:10.0.0.21#62838
2024-04-10 16:07:11 I [server.zig:349 QueryLog.forward] forward query(qid:1, from:tcp, 'www.youtube.com') to trust group
2024-04-10 16:07:11 I [Upstream.zig:490 Group.send] forward query(qid:1, from:tcp) to upstream tcpin://127.0.0.1#5353
2024-04-10 16:07:11 I [Upstream.zig:490 Group.send] forward query(qid:1, from:tcp) to upstream tcpin://127.0.0.1#5352
2024-04-10 16:07:11 E [Upstream.zig:148 _send_tcp] connect(8, 'tcpin://127.0.0.1#5353') failed: (146) Connection refused
2024-04-10 16:07:11 E [Upstream.zig:148 _send_tcp] connect(9, 'tcpin://127.0.0.1#5352') failed: (146) Connection refused

客户端的dig报以下错误

[root@ManTou:/root]#dig @127.0.0.1 -p5354 +tcp www.youtube.com
;; Connection to 127.0.0.1#5354(127.0.0.1) for www.youtube.com failed: connection refused.

同样地。国内网站使用tcp查询时因为运营商dns不支持tcp而导至查询失败连接超时

2024-04-10 16:42:37 I [server.zig:302 QueryLog.query] query(id:53791, tag:chn, qtype:1, 'www.taobao.com') from ::ffff:127.0.0.1#35362
2024-04-10 16:42:37 I [server.zig:349 QueryLog.forward] forward query(qid:33, from:tcp, 'www.taobao.com') to china group
2024-04-10 16:42:37 I [Upstream.zig:490 Group.send] forward query(qid:33, from:tcp) to upstream tcpin://211.136.192.6
2024-04-10 16:42:37 I [Upstream.zig:490 Group.send] forward query(qid:33, from:tcp) to upstream tcpin://120.196.165.24
2024-04-10 16:42:42 W [server.zig:827 on_timeout] query(qid:33, id:53791, tag:chn) from tcp://::ffff:127.0.0.1#35362 [timeout]
2024-04-10 16:42:42 E [Upstream.zig:148 _send_tcp] connect(12, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-10 16:42:42 E [Upstream.zig:148 _send_tcp] connect(13, 'tcpin://120.196.165.24') failed: (145) Operation timed out

[root@ManTou:/root]#dig @127.0.0.1 -p5354 +tcp www.taobao.com
; <<>> DiG 9.9.9-P3 <<>> @127.0.0.1 -p5354 +tcp www.taobao.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

虽然运行时看似upstream有udpin和tcpin,但是客户端只要指定了tcp查询,chinadns-ng就只forward tcp,并不会使用udp,

windmsn commented 7 months ago

目前临时解决的方法就只能是先指定udp了 bind-port 5354@udp 后续还是希望能对上游dns进行指定? 或者接收到tcp查询时,同时向upstream进行tcpin与udpin的转发?

zfl9 commented 7 months ago

153

zfl9 commented 7 months ago

不过有一点你确实提醒了我,就是启动时的打印消息:

现在是无脑的打印 tcpin 和 udpin 两个上游地址(没有管TCP、UDP是否监听)

zfl9 commented 7 months ago

你的问题与 #153 完全一样,我在那里也做了详细解释和说明:

https://github.com/zfl9/chinadns-ng/issues/153#issuecomment-2019271948

zfl9 commented 7 months ago

或者接收到tcp查询时,同时向upstream进行tcpin与udpin的转发?

不可行,具体原因见 #153

简单来说:

windmsn commented 7 months ago

我昨天也翻看到https://github.com/zfl9/chinadns-ng/issues/153 的内容,

目前配置文件如下

bind-addr ::
bind-port 5354
china-dns 211.136.192.6,120.196.165.24,tcp://223.5.5.5,tcp://223.6.6.6
trust-dns 127.0.0.1#5353,127.0.0.1#5352,tcp://2001:4860:4860::8888,tcp://2001:4860:4860::8844,tcp://8.8.8.8,tcp://8.8.4.4
gfwlist-file /root/gfwlist.txt
chnlist-file /root/chnlist.txt
ipset-name4 china
ipset-name6 chnroute6
chnlist-first
add-taggfw-ip china-banned,china-banned6

想了一下能不能这样优化

在国内dns里加上了tcp协议的dns tcp://223.5.5.5,tcp://223.6.6.6

当客户端发起tcp查询时。chinadns-ng能否直接把查询转发到指定tcp协议的dns服务器上。而不转发到211.136.192.6,120.196.165.24这类不支持tcp的运营商dns服务器。 当客户端使用udp查询时,则转发给任意协议的上游。

昨天到现在统计限一下udp的查询占有95%,tcp查询占有5%,所以udp查询在运营商的dns上是最快的。

zfl9 commented 7 months ago

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。

但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

zfl9 commented 7 months ago

想了一下能不能这样优化

在国内dns里加上了tcp协议的dns tcp://223.5.5.5,tcp://223.6.6.6

当客户端发起tcp查询时。chinadns-ng能否直接把查询转发到指定tcp协议的dns服务器上。而不转发到211.136.192.6,120.196.165.24这类不支持tcp的运营商dns服务器。

根据你给出的配置,目前chinadns-ng对于一个上游组的查询策略是 并发查询所有,因此你说的是可以做到的,只不过实际的情况是这样:

从tcp收到查询,转发给china组时:

最终结果采纳最先返回的哪个,也就是要么是223.5.5.5、要么是223.6.6.6返回的结果,运营商dns失败不会有任何影响(这就是允许配置多个dns上游的核心目的)。

zfl9 commented 7 months ago

当客户端使用udp查询时,则转发给任意协议的上游。

这个目前就是这样工作的,无需更改。

windmsn commented 7 months ago

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。

但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

其实我现在纠结的是当客户端使用tcp查询发送到上游运营商DNS:211.136.192.6,120.196.165.24的时候他使用tcpin导致Operation timed out,然后dnsmasq无故挂起后启动多个进程的问题。。

然而。如果bind-port 5354@udp,或者去掉运营商dns(211.136.192.6,120.196.165.24,),只使用223.5.5.5,223.6.6.6可以udp+tcp的dns时

dnsmasq则正常没问题

2024-04-11 02:12:12 E [Upstream.zig:148 _send_tcp] connect(32, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-11 02:12:12 E [Upstream.zig:148 _send_tcp] connect(33, 'tcpin://120.196.165.24') failed: (145) Operation timed out
2024-04-11 02:12:12 I [server.zig:203 service_tcp] new connection:11 from ::ffff:127.0.0.1#50222
2024-04-11 02:12:12 I [server.zig:302 QueryLog.query] query(id:55607, tag:chn, qtype:1, 'www.163.com') from ::ffff:127.0.0.1#50222
2024-04-11 02:12:12 I [server.zig:349 QueryLog.forward] forward query(qid:448, from:tcp, 'www.163.com') to china group
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcpin://211.136.192.6
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcpin://120.196.165.24
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcp://223.5.5.5
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcp://223.6.6.6
2024-04-11 02:12:12 I [server.zig:531 ReplyLog.reply] reply(qid:448, tag:chn, qtype:1, 'www.163.com') from tcp://223.6.6.6 [accept]
2024-04-11 02:12:12 I [server.zig:531 ReplyLog.reply] reply(qid:448, tag:null, qtype:1, 'www.163.com') from tcp://223.5.5.5 [ignore]
2024-04-11 02:12:12 I [server.zig:203 service_tcp] close connection:11 from ::ffff:127.0.0.1#50222
2024-04-11 02:12:13 E [Upstream.zig:148 _send_tcp] connect(34, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-11 02:12:13 E [Upstream.zig:148 _send_tcp] connect(35, 'tcpin://120.196.165.24') failed: (145) Operation timed out
2024-04-11 02:12:14 I [server.zig:203 service_tcp] new connection:11 from ::ffff:127.0.0.1#37595
2024-04-11 02:12:14 I [server.zig:302 QueryLog.query] query(id:44846, tag:chn, qtype:1, 'www.163.com') from ::ffff:127.0.0.1#37595
2024-04-11 02:12:14 I [server.zig:349 QueryLog.forward] forward query(qid:449, from:tcp, 'www.163.com') to china group
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcpin://211.136.192.6
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcpin://120.196.165.24
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcp://223.5.5.5
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcp://223.6.6.6
2024-04-11 02:12:14 I [server.zig:531 ReplyLog.reply] reply(qid:449, tag:chn, qtype:1, 'www.163.com') from tcp://223.6.6.6 [accept]
2024-04-11 02:12:14 I [server.zig:203 service_tcp] close connection:11 from ::ffff:127.0.0.1#37595
2024-04-11 02:12:14 I [server.zig:531 ReplyLog.reply] reply(qid:449, tag:null, qtype:1, 'www.163.com') from tcp://223.5.5.5 [ignore]
2024-04-11 02:12:14 E [Upstream.zig:148 _send_tcp] connect(36, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-11 02:12:14 E [Upstream.zig:148 _send_tcp] connect(37, 'tcpin://120.196.165.24') failed: (145) Operation timed out
zfl9 commented 7 months ago

其实,你可以关闭 dnsmasq 的 dns 功能,让 chinadns-ng 负责所有 dns,这样就不会有你说的 dnsmasq 因为 tcp 查询而启动多个进程的问题了。

-p, --port=<port>
Listen on <port> instead of the standard DNS port (53). Setting this to zero completely disables DNS function, leaving only DHCP and/or TFTP.

将 dnsmasq 里面的 port 设置为 0,这样就关闭了 DNS,只留下 DHCP 等功能。

zfl9 commented 7 months ago

因为目前 dnsmasq 对于 TCP 上的 DNS 实现很糟糕,效率也很低,每个 TCP 连接/查询 都会 fork 一个新的 dnsmasq 进程去处理,如果并发量稍微高一些,再加上路由器上本来内存就不多,进程数多起来之后很容易把系统弄得宕机。

不如让dnsmasq专门负责DHCP,dns交给其他软件去做。

windmsn commented 7 months ago

因为目前 dnsmasq 对于 TCP 上的 DNS 实现很糟糕,效率也很低,每个 TCP 连接/查询 都会 fork 一个新的 dnsmasq 进程去处理,如果并发量稍微高一些,再加上路由器上本来内存就不多,进程数多起来之后很容易把系统弄得宕机。

不如让dnsmasq专门负责DHCP,dns交给其他软件去做。

现在先这样处理了,关了dnsmasq的53端口。chinadns-ng监听tcp+udp的53端口,添加了cache与verdict-cache,先观察使用情况。感谢大神回复!!!

zfl9 commented 7 months ago

另外,我建议你修改下配置,没必要在 223.5.5.5/223.6.6.6 前面加上 tcp:// 限定,直接和运营商 DNS 一样就行了(没有协议限定),这样 chinadns-ng 这边会自动根据查询方的传入协议来决定与上游的通信协议,因为大部分情况下DNS仍然走UDP,所以能减少很多不必要的TCP查询。

zfl9 commented 7 months ago

dns2tcp只能监听udp,使用chinadns-ng自带的tcp://8.8.8.8会出现大量的connection reset by peer.所以只能用dns2tcp

这个问题也不太可能吧,在tcp的处理上,chinadns-ng和dns2tcp一样的。访问失败建议查询下是不是iptables规则问题(没有走代理?)

windmsn commented 7 months ago

dns2tcp只能监听udp,使用chinadns-ng自带的tcp://8.8.8.8会出现大量的connection reset by peer.所以只能用dns2tcp

这个问题也不太可能吧,在tcp的处理上,chinadns-ng和dns2tcp一样的。访问失败建议查询下是不是iptables规则问题(没有走代理?)

其实这是我一个很奇怪的需求,目前我这的运营商是中国移动的,ping 8.8.8.8以及2001:4860:4860::8888的延时平时只有20ms左右,夜深人静的时候。延时更可达到10ms。

rmbp ~ % ping6 2001:4860:4860::8888
PING6(56=40+8+8 bytes) 2409:8a55:4ced:4530:8e85:90ff:fe50:26d6 --> 2001:4860:4860::8888
16 bytes from 2001:4860:4860::8888, icmp_seq=0 hlim=54 time=22.103 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=1 hlim=54 time=22.634 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=2 hlim=54 time=21.940 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=3 hlim=54 time=22.151 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=4 hlim=54 time=21.955 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=5 hlim=54 time=22.218 ms

然而,使用8.8.8.8以及2001:4860:4860::8888查udp的53端口,一些国外的域名就会被污染。使用tcp查询。则干净。

平时主要在油站游荡。使用ipv6能直连rr1---sn-i3b7knsd.googlevideo.com等googlevideo.com的域名。 直连8.8.8.8解析出来googlevideo.com的ip是香港的,速度很快,然而我的小鸡是美国的。如果走代理出去,那解析出来的ip则是美国的,连接就相对较慢了。所以dns和代理策略上我还做了分流,googlevideo.com的采用ipv6直连。而youtube.com等则走代理。

WX20240411-120443@2x

WX20240411-115557@2x

然后看到chinadns-ng也能直接支持tcp://,就把dns2tcp关了。然后路由器就经常报。

他能用。但就是会报这个错。

Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.672000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.692000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.732000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.736000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.776000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.780000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.828000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.456000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.472000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.476000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.496000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.592000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.608000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.628000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.632000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1269.640000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1269.672000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1269.676000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.056000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.060000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.064000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.068000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.100000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.112000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.116000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.140000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.484000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.500000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.504000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.520000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.524000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.528000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.532000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.540000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.572000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.580000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.580000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.608000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.636000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.668000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.672000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.700000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.708000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.712000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.660000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.664000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.704000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.708000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.740000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.744000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.756000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.116000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.128000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.132000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.152000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.160000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.164000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.188000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.192000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.196000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.528000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.536000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.540000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.572000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.576000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.580000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.612000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.616000] connection reset by peer.
Thu Apr 11 12:00:00 2024 kern.warn kernel: [ 1306.916000] connection reset by peer.
Thu Apr 11 12:00:00 2024 kern.warn kernel: [ 1306.960000] connection reset by peer.
Thu Apr 11 12:00:00 2024 kern.warn kernel: [ 1307.004000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.828000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.832000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.840000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.844000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.864000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.872000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.876000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.912000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.916000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.244000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.272000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.276000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.288000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.308000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.320000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.324000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.636000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.644000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.648000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.656000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.684000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.692000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.696000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.724000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.728000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.748000] connection reset by peer.
Thu Apr 11 12:03:04 2024 kern.warn kernel: [ 1491.148000] connection reset by peer.
Thu Apr 11 12:03:04 2024 kern.warn kernel: [ 1491.208000] connection reset by peer.
Thu Apr 11 12:03:04 2024 kern.warn kernel: [ 1491.252000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.140000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.156000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.176000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.196000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.204000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.228000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.260000] connection reset by peer.
Thu Apr 11 12:03:25 2024 kern.warn kernel: [ 1511.768000] connection reset by peer.
Thu Apr 11 12:03:25 2024 kern.warn kernel: [ 1511.840000] connection reset by peer.
Thu Apr 11 12:03:39 2024 kern.warn kernel: [ 1526.584000] connection reset by peer.

但是chinadns-ng(udp)->udp2tcp->8.8.8.8就没有这个问题。。。

zfl9 commented 7 months ago

好吧,那确实摸不着头脑。那就不管了哈哈。

windmsn commented 7 months ago

好吧,那确实摸不着头脑。那就不管了哈哈。

所以。。才想着。上游那里能指定udp就更好了,因为按照我现在的配置

trust-dns 127.0.0.1#5353,127.0.0.1#5352,tcp://2001:4860:4860::8888,tcp://2001:4860:4860::8844,tcp://8.8.8.8,tcp://8.8.4.4

127.0.0.1#5353,127.0.0.1#5352 dns2tcp<上游8.8.8.8>,只接收udp的协议,走tcp的时候又会抛connect(8, 'tcpin://127.0.0.1#5353') failed: (146) Connection refused)

tcp://2001:4860:4860::8888,tcp://2001:4860:4860::8844,tcp://8.8.8.8,tcp://8.8.4.4 走这个tcp时。。路由器又抛kern.warn kernel: [ 1526.584000] connection reset by peer.

zfl9 commented 7 months ago

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。

但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

回头有空我看看,用这个思路,这样就能支持 udp:// 上游了(仅udp查询),在TC截断重试的查询中,此类上游被禁用。

因为tc的情况还是比较少见,所以还是没啥问题的。逻辑上和效率上都OK

windmsn commented 7 months ago

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。 但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

回头有空我看看,用这个思路,这样就能支持 udp:// 上游了(仅udp查询),在TC截断重试的查询中,此类上游被禁用。

因为tc的情况还是比较少见,所以还是没啥问题的。逻辑上和效率上都OK

好咧。感谢大佬的关注和回复。。

就目前来说,我觉得dns所返回的MSG SIZE,应该是由网站/app/dns的服务提供商控制的。 例如我开发一款直播APP。当我知道域名解析的时候MSG SIZE超过udp的MSG SIZE,将会使用CNAME等控制MSG SIZE的大小,或者在app里使用tcp协议进行查询。反之,就一直使用udp查询。遇到部分地区用户的dns动持时,还需要app内跟服务器获取dns解析结果。。 dns的服务提供商也是一样。dnspod的119.29.29.29,119.28.28.28,中国移动的211.136.192.6,120.196.165.24,中国电信的202.96.128.86,202.96.134.33其实也不支持tcp查询。

rmbp ~ % dig @10.0.0.1 www.qq.com

; <<>> DiG 9.10.6 <<>> @10.0.0.1 www.qq.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36762
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 3, ADDITIONAL: 14

;; QUESTION SECTION:
;www.qq.com.            IN  A

;; ANSWER SECTION:
www.qq.com.     71  IN  CNAME   ins-r23tsuuf.ias.tencent-cloud.net.
ins-r23tsuuf.ias.tencent-cloud.net. 70 IN A 112.53.42.114
ins-r23tsuuf.ias.tencent-cloud.net. 70 IN A 112.53.42.52

;; AUTHORITY SECTION:
tencent-cloud.net.  40953   IN  NS  ns-open1.qq.com.
tencent-cloud.net.  40953   IN  NS  ns-open3.qq.com.
tencent-cloud.net.  40953   IN  NS  ns-open2.qq.com.

;; ADDITIONAL SECTION:
ns-open1.qq.com.    170748  IN  A   117.135.174.196
ns-open1.qq.com.    170748  IN  A   203.205.236.176
ns-open1.qq.com.    170748  IN  A   59.36.132.139
ns-open2.qq.com.    171717  IN  A   182.254.59.163
ns-open2.qq.com.    171717  IN  A   203.205.195.63
ns-open2.qq.com.    171717  IN  A   203.205.195.122
ns-open2.qq.com.    171717  IN  A   61.241.27.10
ns-open3.qq.com.    59  IN  A   121.51.167.100
ns-open3.qq.com.    59  IN  A   140.207.180.51
ns-open3.qq.com.    59  IN  A   203.205.220.25
ns-open3.qq.com.    59  IN  A   218.68.91.163
ns-open3.qq.com.    59  IN  A   101.227.161.202
ns-open1.qq.com.    171717  IN  AAAA    2402:4e00:111:ffe::3
ns-open2.qq.com.    171717  IN  AAAA    240e:e1:aa00:2001::3

;; Query time: 8 msec
;; SERVER: 10.0.0.1#53(10.0.0.1)
;; WHEN: Thu Apr 11 12:42:35 CST 2024
;; MSG SIZE  rcvd: 425

其实正常情况MSG SIZE 都不大。。。

zfl9 commented 7 months ago

想了下,其实根本不用在chinadns-ng这边重试,直接过滤TC的reply就行了(tcp查询时) 因为此时上游组中肯定至少有一个 tcp-based 上游(tcp://tcpi://tls://)。

无协议限定的上游由这样两个“upstream”组成:tcpi://(查询方使用tcp时启用)、udpi://(查询方使用udp时启用)


UPDATE: dev分支已修改,测试正常。

windmsn commented 7 months ago

想了下,其实根本不用在chinadns-ng这边重试,直接过滤TC的reply就行了(tcp查询时) 因为此时上游组中肯定至少有一个 tcp-based 上游(tcp://tcpi://tls://)。

无协议限定的上游由这样两个“upstream”组成:tcpi://(查询方使用tcp时启用)、udpi://(查询方使用udp时启用)

UPDATE: dev分支已修改,测试正常。

坐等更新,今天使用还发现一个问题,

dns上游为 china-dns 202.96.128.86,202.96.134.33,tcp://223.5.5.5

查询域名cn-beijing-data.aliyundrive.net时上游返回EDNS的数据MSG SIZE达到530 返回EDNS是概率性的。并非每一次都返回530 SIZE的EDNS reply,有70%的概率是返回491 SIZE的reply

root@OLAY:~# dig @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57624
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 15

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201

;; AUTHORITY SECTION:
alibabadns.com.         157     IN      NS      ns1.alibabadns.com.
alibabadns.com.         157     IN      NS      ns2.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     472     IN      A       140.205.103.192
ns1.alibabadns.com.     472     IN      A       140.205.122.66
ns1.alibabadns.com.     472     IN      A       47.88.74.38
ns1.alibabadns.com.     472     IN      A       47.241.207.18
ns1.alibabadns.com.     472     IN      A       106.11.35.19
ns1.alibabadns.com.     472     IN      A       106.11.41.157
ns2.alibabadns.com.     573     IN      A       106.11.41.158
ns2.alibabadns.com.     573     IN      A       140.205.103.194
ns2.alibabadns.com.     573     IN      A       140.205.122.77
ns2.alibabadns.com.     573     IN      A       47.88.74.36
ns2.alibabadns.com.     573     IN      A       47.241.207.16
ns2.alibabadns.com.     573     IN      A       106.11.35.18
ns1.alibabadns.com.     524     IN      AAAA    2401:b180:4100::1
ns2.alibabadns.com.     519     IN      AAAA    2401:b180:4100::2

;; Query time: 0 msec
;; SERVER: 127.0.0.1#15354(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 19:49:43 CST 2024
;; MSG SIZE  rcvd: 530

此时chinadns-ng把这个reply给缓存了。 然后客户端(wget)访问cn-beijing-data.aliyundrive.net时就会报这个错: wget: unable to resolve host address 'cn-beijing-data.aliyundrive.net'

只有等缓存时间过了,再请求cn-beijing-data.aliyundrive.net时。返回新的reply没有带EDNS字样MSG SIZE 为491时。才能正常访问cn-beijing-data.aliyundrive.net

root@OLAY:~# dig @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55717
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201

;; AUTHORITY SECTION:
alibabadns.com.         285     IN      NS      ns1.alibabadns.com.
alibabadns.com.         285     IN      NS      ns2.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     326     IN      A       140.205.122.66
ns1.alibabadns.com.     326     IN      A       47.88.74.38
ns1.alibabadns.com.     326     IN      A       47.241.207.18
ns1.alibabadns.com.     326     IN      A       106.11.35.19
ns1.alibabadns.com.     326     IN      A       106.11.41.157
ns1.alibabadns.com.     326     IN      A       140.205.103.192
ns2.alibabadns.com.     371     IN      A       140.205.103.194
ns2.alibabadns.com.     371     IN      A       140.205.122.77
ns2.alibabadns.com.     371     IN      A       47.88.74.36
ns2.alibabadns.com.     371     IN      A       47.241.207.16
ns2.alibabadns.com.     371     IN      A       106.11.35.18
ns2.alibabadns.com.     371     IN      A       106.11.41.158
ns1.alibabadns.com.     187     IN      AAAA    2401:b180:4100::1

;; Query time: 4 msec
;; SERVER: 127.0.0.1#15354(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 20:01:53 CST 2024
;; MSG SIZE  rcvd: 491

当使用dnsmasq时。第一次请求没缓存的SIZE也是491,如下

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27716
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203

;; AUTHORITY SECTION:
alibabadns.com.         395     IN      NS      ns2.alibabadns.com.
alibabadns.com.         395     IN      NS      ns1.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     90      IN      A       47.88.74.38
ns1.alibabadns.com.     90      IN      A       47.241.207.18
ns1.alibabadns.com.     90      IN      A       106.11.35.19
ns1.alibabadns.com.     90      IN      A       106.11.41.157
ns1.alibabadns.com.     90      IN      A       140.205.103.192
ns1.alibabadns.com.     90      IN      A       140.205.122.66
ns2.alibabadns.com.     473     IN      A       106.11.41.158
ns2.alibabadns.com.     473     IN      A       140.205.103.194
ns2.alibabadns.com.     473     IN      A       140.205.122.77
ns2.alibabadns.com.     473     IN      A       47.88.74.36
ns2.alibabadns.com.     473     IN      A       47.241.207.16
ns2.alibabadns.com.     473     IN      A       106.11.35.18
ns1.alibabadns.com.     281     IN      AAAA    2401:b180:4100::1

;; Query time: 3 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 20:04:47 CST 2024
;; MSG SIZE  rcvd: 491

再次查询时。他缓存的SIZE就只有249了。

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42531
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 573 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 573 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 573 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 573 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 573 IN A 49.7.23.200

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 20:05:14 CST 2024
;; MSG SIZE  rcvd: 249

仔细看过。dnsmasq缓存的只有ANSWER SECTION但是多了一个EDNS的标识 chinadns-ng的缓存是把整个reply给缓存上去了。。。

zfl9 commented 7 months ago

这其实是一个musl的dns问题,最新的musl才支持tcp fallback(udp上的msg被TC时),之前的版本不支持。

因为530字节,刚好超过了512的默认大小(musl没有带EDNS0选项,所以只能接受512字节的udp msg)。

https://news.ycombinator.com/item?id=36933028

https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001


不过为了兼容性(兼容这种不支持tcp fallback的resolver),等会我也修改下cache逻辑吧,只保留answer section,其他去除。

缓存的reply实际上没有问题,只是老版本musl不支持>512自己的包,哈哈。。

windmsn commented 7 months ago

这其实是一个musl的dns问题,最新的musl才支持tcp fallback(udp上的msg被TC时),之前的版本不支持。

因为530字节,刚好超过了512的默认大小(musl没有带EDNS0选项,所以只能接受512字节的udp msg)。

https://news.ycombinator.com/item?id=36933028

https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001

不过为了兼容性(兼容这种不支持tcp fallback的resolver),等会我也修改下cache逻辑吧,只保留answer section,其他去除。

缓存的reply实际上没有问题,只是老版本musl不支持>512自己的包,哈哈。。

我差点忽略了一个细节。就是那个wget。在第一次请求(reply还没缓存)的时候是可以的,然后我手贱把wget关了再开。。才发现这个问题。就是wget查chinadns-ng缓存命中之后才会这样。

zfl9 commented 7 months ago

其实是因为你第一次dig的请求,dig支持edns,所以第一次的reply是有edns rr的,看log,这个rr应该有十几二十字节。

然后chinadns-ng把这个reply缓存下来

第二次wget来请求,因为刚好size超过了512,于是产生了truncate,刚好你的musl版本没有tcp fallback。所以提示解析失败。

此时你可以其他主机用wget测试(glibc版本的,不能是musl),或者用dig重新请求,其实都是正常的,没有问题。

另外,如果第一次解析请求是musl/glibc发起的,比如用刚刚的wget去解析这个域名(然后被chinadns缓存起来),因为size为491,不会发生truncate,其实就没问题。


总之,原因已经了解,待会我改一下就ok了。

windmsn commented 7 months ago

其实是因为你第一次dig的请求,dig支持edns,所以第一次的reply是有edns rr的,看log,这个rr应该有十几二十字节。

然后chinadns-ng把这个reply缓存下来

第二次wget来请求,因为刚好size超过了512,于是产生了truncate,刚好你的musl版本没有tcp fallback。所以提示解析失败。

此时你可以其他主机用wget测试(glibc版本的,不能是musl),或者用dig重新请求,其实都是正常的,没有问题。

另外,如果第一次解析请求是musl/glibc发起的,比如用刚刚的wget去解析这个域名(然后被chinadns缓存起来),因为size为491,不会发生truncate,其实就没问题。

总之,原因已经了解,待会我改一下就ok了。

顺序应该是这样的:

第一次wget是正常的。当时还没dig。第一个reply应该返回给wget了,并且chinadns-ng进行缓存, 然后第二次wget时报错。以为是dns挂了,才去dig的。dig的时候就发现能dig出来Query time: 0 msec应该是缓存来的。但wget就报错了。

第二次wget的时候dns部分应该只有resolver和chinadns-ng通迅了。中间是否少了什么东西。。

。。但这个问题很难复现,

zfl9 commented 7 months ago

应该不用复现了,因为这个问题本质就是旧版 musl 不支持 size > 512 的 msg(具体地说,> 512 的 reply 被 TC 了,而 musl 又不支持 tcp fallback),所以 musl 报告 dns 解析失败,于是 wget 抛出这个错误。

此时,如果使用glibc版本的wget,或者用浏览器去访问,都是ok的。问题其实与chinadns-ng无关。

我已经在修改缓存代码了,缓存时只保留answer节(也就是最小化的response),防止msg过大,而resolver又不支持tcp fallback的问题。(也就是dnsmasq这样的行为,缓存的reply只有answer)

zfl9 commented 7 months ago

dev已修改,测试ok。

windmsn commented 7 months ago

dnsmasq上是添加了

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232

flags由 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13 变成了 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

移除了;; AUTHORITY SECTION:;; ADDITIONAL SECTION:

但是缓存后的ADDITIONAL: 1不知道在哪。。没看到

完整的是这样的

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3603
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200

;; AUTHORITY SECTION:
alibabadns.com.         377     IN      NS      ns1.alibabadns.com.
alibabadns.com.         377     IN      NS      ns2.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     570     IN      A       47.88.74.38
ns1.alibabadns.com.     570     IN      A       47.241.207.18
ns1.alibabadns.com.     570     IN      A       106.11.35.19
ns1.alibabadns.com.     570     IN      A       106.11.41.157
ns1.alibabadns.com.     570     IN      A       140.205.103.192
ns1.alibabadns.com.     570     IN      A       140.205.122.66
ns2.alibabadns.com.     550     IN      A       106.11.35.18
ns2.alibabadns.com.     550     IN      A       106.11.41.158
ns2.alibabadns.com.     550     IN      A       140.205.103.194
ns2.alibabadns.com.     550     IN      A       140.205.122.77
ns2.alibabadns.com.     550     IN      A       47.88.74.36
ns2.alibabadns.com.     550     IN      A       47.241.207.16
ns1.alibabadns.com.     282     IN      AAAA    2401:b180:4100::1

;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 21:39:38 CST 2024
;; MSG SIZE  rcvd: 491

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57290
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:

; EDNS: version: 0, flags:; udp: 1232

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 596 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 596 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 596 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 596 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 596 IN A 49.7.23.203

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 21:39:42 CST 2024
;; MSG SIZE  rcvd: 249
zfl9 commented 7 months ago

但是缓存后的ADDITIONAL: 1不知道在哪。。没看到

就是 OPT RR,也就是 EDNS version 0 那行。

windmsn commented 7 months ago

但是缓存后的ADDITIONAL: 1不知道在哪。。没看到

就是 OPT RR,也就是 EDNS version 0 那行。

期待新版本,,,求编译个开发版测试测试。。。

zfl9 commented 7 months ago

什么平台?发release的chinadns-ng文件名给我

windmsn commented 7 months ago

什么平台?发release的chinadns-ng文件名给我 ChinaDNS-NG 2024.03.27 | target:x86_64-linux-musl | cpu:x86_64_v3 | mode:fast+lto ChinaDNS-NG 2024.03.27 | target:mipsel-linux-musl | cpu:mips32r5+soft_float | mode:fast+lto

两个平台。

zfl9 commented 7 months ago

TEMP.zip

zfl9 commented 7 months ago

明天会发布一个版本。

windmsn commented 7 months ago

明天会发布一个版本。

目前两个平台测试未发现问题。。

zfl9 commented 7 months ago

见最新版本。