kubeedge / edgemesh

Simplified network and services for edge applications
https://edgemesh.netlify.app/
Apache License 2.0
259 stars 132 forks source link

边访问云时,边缘端解析域名失败 #527

Open kkkkeynoted opened 10 months ago

kkkkeynoted commented 10 months ago

What happened: 在尝试跨云边通信示例时,云端能够正常访问边端,但是边端发起访问时报错telnet: bad address 'tcp-echo-cloud-svc.cloudzone'

参考https://zhuanlan.zhihu.com/p/585749690?spm=a2c6h.12873639.article-detail.17.147f7be1zwc0BC问题五中方案进行解决,但是并未成功。 3

我尝试修改了edgemesh/example下的edgezone.yaml文件,希望生成的pod(busybox-sleep-edge)以主机网络启动,但是没有得到想要的效果。 1

Environment:

Poorunga commented 10 months ago

@kkkkeynoted 可以提供一些边端edgemesh-agent的日志

kkkkeynoted commented 10 months ago

@Poorunga 好的,请稍等

kkkkeynoted commented 10 months ago

@Poorunga 这是edge上的edgemesh-agent的日志,麻烦您看一下 图片1

Poorunga commented 8 months ago

参考https://zhuanlan.zhihu.com/p/585749690?spm=a2c6h.12873639.article-detail.17.147f7be1zwc0BC问题五中方案进行解决,但是并未成功。

您可以描述一下运行到哪一个步骤的时候,失败了呢?

victorming666 commented 8 months ago

我也遇到同样问题,云访问边可以,边访问云就找不到tcp-echo-cloud-svc.cloudzone这个域名。 kubeedge version: 1.14 edgemesh: master 以下是edge节点上edgemesh的日志: image

victorming666 commented 8 months ago

@Poorunga ,我这边通过ClusterIP可以实现边访问云,但是域名就不行,还在继续查看问题。另外,我的一个服务器是在另外一个网段,云端svc的pod是部署在那个网段服务器上(192.168.10.200),跟边缘侧(192.168.100.153)不是一个子网段,会不会这个问题导致的呢?k8s好像不支持跨子网pod之间访问吧?

victorming666 commented 6 months ago

我这边edge节点上edgemesh-agent报这个错: "Failed to ensure portal" err="error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match physdev':No such file or directory\n\nTryiptables -h' or 'iptables --help' for more information.\n" servicePortName="jupyter/admin-4af6-external:http0" E0308 16:39:23.427682 1 proxier.go:782] "Failed to install iptables rule for service" err="error checking rule: exit status 2: iptables v1.8.3 (legacy): Couldn't load match physdev':No such file or directory\n\nTryiptables -h' or 'iptables --help' for more information.\n" chain=KUBE-PORTALS-CONTAINER servicePortName="jupyter/admin-4af6-external:http2" args=[-m comment --comment jupyter/admin-4af6-external:http2 -p tcp -m tcp --dport 10062 -d 192.168.102.200 -m physdev ! --physdev-is-in -j DNAT --to-destination 169.254.96.16:42745] 这个是怎么回事?

Poorunga commented 6 months ago

@victorming666 可能是缺少内核模块,关键报错是这个:Couldn't load match physdev':No such file or directory\n\nTry

victorming666 commented 6 months ago

好,我现在给RK3588打补丁,再看还有没有问题

victorming666 commented 6 months ago

现在edge端已经打了补丁了,而且edgemesh-agent重启后,后台日志没有报iptable的错误了: image

I0308 17:10:25.413171 1 log.go:184] [INFO] 172.17.0.5:41830 - 4 "AAAA IN tcp-echo-cloud-svc.cloudzone.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,aa,rd 157 0.000252879s I0308 17:10:25.413980 1 log.go:184] [INFO] 172.17.0.5:50145 - 5 "AAAA IN tcp-echo-cloud-svc.cloudzone.cluster.local. udp 60 false 512" NXDOMAIN qr,aa,rd 153 0.000260171s I0308 17:10:25.418000 1 log.go:184] [INFO] 172.17.0.5:58021 - 6 "A IN tcp-echo-cloud-svc.cloudzone. udp 46 false 512" NXDOMAIN qr,rd,ra 46 0.00347469s I0308 17:10:25.418590 1 log.go:184] [INFO] 172.17.0.5:48171 - 7 "A IN tcp-echo-cloud-svc.cloudzone.edgezone.svc.cluster.local. udp 73 false 512" NXDOMAIN qr,aa,rd 166 0.000216296s I0308 17:10:25.418910 1 log.go:184] [INFO] 172.17.0.5:45751 - 8 "A IN tcp-echo-cloud-svc.cloudzone.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,aa,rd 157 0.000168878s I0308 17:10:25.419245 1 log.go:184] [INFO] 172.17.0.5:35439 - 9 "A IN tcp-echo-cloud-svc.cloudzone.cluster.local. udp 60 false 512" NXDOMAIN qr,aa,rd 153 0.000165378s I0308 17:11:07.835374 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWHJ5UkwwdUXoFJ6U7vdD1UUsX24SW7iYUE345vZYE66rG: [/ip4/192.168.102.200/tcp/20006]} I0308 17:11:07.835687 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWEVzfjqxbTCYURAXSdXMZhf7fW1d7i2twGEr8SELTqcRZ: [/ip4/192.168.102.201/tcp/20006]} I0308 17:12:07.834605 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWHJ5UkwwdUXoFJ6U7vdD1UUsX24SW7iYUE345vZYE66rG: [/ip4/192.168.102.200/tcp/20006]} I0308 17:12:07.835063 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWEVzfjqxbTCYURAXSdXMZhf7fW1d7i2twGEr8SELTqcRZ: [/ip4/192.168.102.201/tcp/20006]} I0308 17:12:07.837172 1 tunnel.go:585] [Heartbeat] Already has connection between {12D3KooWHJ5UkwwdUXoFJ6U7vdD1UUsX24SW7iYUE345vZYE66rG: [/ip4/192.168.102.200/tcp/20006]} and me I0308 17:12:07.837623 1 tunnel.go:585] [Heartbeat] Already has connection between {12D3KooWEVzfjqxbTCYURAXSdXMZhf7fW1d7i2twGEr8SELTqcRZ: [/ip4/192.168.102.201/tcp/20006]} and me I0308 17:13:07.835588 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWHJ5UkwwdUXoFJ6U7vdD1UUsX24SW7iYUE345vZYE66rG: [/ip4/192.168.102.200/tcp/20006]} I0308 17:13:07.836648 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWEVzfjqxbTCYURAXSdXMZhf7fW1d7i2twGEr8SELTqcRZ: [/ip4/192.168.102.201/tcp/20006]} I0308 17:14:07.835138 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWHJ5UkwwdUXoFJ6U7vdD1UUsX24SW7iYUE345vZYE66rG: [/ip4/192.168.102.200/tcp/20006]} I0308 17:14:07.835479 1 tunnel.go:502] [Finder] send relayMap peer: {12D3KooWEVzfjqxbTCYURAXSdXMZhf7fW1d7i2twGEr8SELTqcRZ: [/ip4/192.168.102.201/tcp/20006]} I0308 17:14:07.837497 1 tunnel.go:585] [Heartbeat] Already has connection between {12D3KooWEVzfjqxbTCYURAXSdXMZhf7fW1d7i2twGEr8SELTqcRZ: [/ip4/192.168.102.201/tcp/20006]} and me I0308 17:14:07.837847 1 tunnel.go:585] [Heartbeat] Already has connection between {12D3KooWHJ5UkwwdUXoFJ6U7vdD1UUsX24SW7iYUE345vZYE66rG: [/ip4/192.168.102.200/tcp/20006]} and me

我把busybox-sleep-edge容器也重启了,但是还是报dns错误: @aiot-edgenode-1:~/software$ docker exec -it $BUSYBOX_CID sh / # telnet tcp-echo-cloud-svc.cloudzone telnet: bad address 'tcp-echo-cloud-svc.cloudzone' / # telnet tcp-echo-cloud-svc.cloudzone 2701 telnet: bad address 'tcp-echo-cloud-svc.cloudzone' / #

victorming666 commented 6 months ago

IP地址就没问题: image

victorming666 commented 6 months ago

我想问一下,这个可能是云边通信应用的配置有问题,还是edgemesh本身有问题?

chenqiangzhishen commented 5 months ago

the same issue