kubeedge / edgemesh

Simplified network and services for edge applications
https://edgemesh.netlify.app/
Apache License 2.0
261 stars 133 forks source link

HTTP test could print the right consquence. #446

Closed Vessel-arch closed 1 year ago

Vessel-arch commented 1 year ago

What happened: When i test the HTTP test, it reports that

[root@master lcy]# kubectl exec -it alpine-test -- sh
/ #  curl hostname-svc:12345
curl: (56) Recv failure: Connection reset by peer

What you expected to happen: Expect to make it correct! How to reproduce it (as minimally and precisely as possible): Just Maual install the edgemesh and git clone this github code, then it happen. Anything else we need to know?: here is my build/agent/resources/04-configmap.yaml(partly):

apiVersion: v1
kind: ConfigMap
metadata:
  name: edgemesh-agent-cfg
  namespace: kubeedge
  labels:
    k8s-app: kubeedge
    kubeedge: edgemesh-agent
data:
  edgemesh-agent.yaml: |
    # For more detailed configuration, please refer to: https://edgemesh.netlify.app/reference/config-items.html#edgemesh-agent-cfg
    modules:
      edgeProxy:
        enable: true
      edgeTunnel:
        enable: true
        relayNodes:
        - nodeName: Master
          advertiseAddress:
          - 192.168.248.132
        - nodeName: node1
          advertiseAddress:
          - 192.168.248.133

And also Environment:


[root@master lcy]# kubectl get all -n kubeedge -o wide
NAME                       READY   STATUS    RESTARTS      AGE   IP                NODE     NOMINATED NODE   READINESS GATES
pod/edgemesh-agent-hmvch   1/1     Running   1 (54m ago)   10h   192.168.248.132   master   <none>           <none>
pod/edgemesh-agent-t7fkk   1/1     Running   0             10h   192.168.248.133   node1    <none>           <none>

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   CONTAINERS       IMAGES                            SELECTOR
daemonset.apps/edgemesh-agent   2         2         2       2            2           <none>          10h   edgemesh-agent   kubeedge/edgemesh-agent:v1.13.1   k8s-app=kubeedge,kubeedge=edgemesh-agent

[root@master edgemesh]# kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   22d   v1.22.0
node1    Ready    agent,edge             22d   v1.22.6-kubeedge-v1.10.0

[root@master edgemesh]# go version
go version go1.16.5 linux/amd64

The edgemesh-agent logs in master :

E0316 21:34:18.424353       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from node1 error: new stream between node1: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/192.168.248.132/tcp/20006/p2p/12D3KooWSod2cjgjgJvbcsw2ynZ5jnAM6Nak9gwWEZ8NE941SG49/p2p-circuit /ip4/192.168.248.133/tcp/20006/p2p/12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK/p2p-circuit /ip4/192.168.248.133/tcp/20006 /ip4/127.0.0.1/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006 /ip4/192.168.248.132/tcp/20006/p2p/12D3KooWSod2cjgjgJvbcsw2ynZ5jnAM6Nak9gwWEZ8NE941SG49/p2p-circuit /ip4/192.168.248.133/tcp/20006/p2p/12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK/p2p-circuit]} err: failed to dial 12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK:\n  * [/ip4/192.168.248.133/tcp/20006] dial backoff\n  * [/ip4/192.168.248.133/tcp/20006/p2p/12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK/p2p-circuit] dial backoff\n  * [/ip4/192.168.248.132/tcp/20006/p2p/12D3KooWSod2cjgjgJvbcsw2ynZ5jnAM6Nak9gwWEZ8NE941SG49/p2p-circuit] dial backoff"
E0316 21:34:18.424394       1 proxysocket.go:98] "Failed to connect to balancer" err="failed to connect to an endpoint"
E0316 21:34:18.672195       1 tunnel.go:592] [Heartbeat] Failed to connect relay {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/192.168.248.133/tcp/20006]} err: failed to dial 12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK:
  * [/ip4/192.168.248.133/tcp/20006] dial backoff
  * [/ip4/192.168.248.132/tcp/20006/p2p/12D3KooWSod2cjgjgJvbcsw2ynZ5jnAM6Nak9gwWEZ8NE941SG49/p2p-circuit] dial backoff
  * [/ip4/192.168.248.133/tcp/20006/p2p/12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK/p2p-circuit] dial backoff

And the edgemesh-agent logs in node1 :

E0316 21:08:18.204457       1 tunnel.go:592] [Heartbeat] Failed to connect relay {12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd: [/ip4/1.1.1.1/tcp/20006]} err: failed to dial 12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd:
  * [/ip4/192.168.248.132/tcp/20006] dial backoff
  * [/ip4/10.244.219.64/tcp/20006] dial backoff
  * [/ip4/1.1.1.1/tcp/20006] dial backoff
E0316 21:08:21.230259       1 tunnel.go:592] [Heartbeat] Failed to connect relay {12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd: [/ip4/1.1.1.1/tcp/20006]} err: failed to dial 12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd:
  * [/ip4/192.168.248.132/tcp/20006] dial backoff
  * [/ip4/10.244.219.64/tcp/20006] dial backoff
  * [/ip4/1.1.1.1/tcp/20006] dial backoff
I0316 21:09:14.416419       1 tunnel.go:587] [Heartbeat] Connection between relay {12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd: [/ip4/1.1.1.1/tcp/20006]} is not established, try connect

By the way , i have check the port 20006. I have no idea if it is correct. Here is:

[lcy@node1 ~]$ telnet 192.168.248.133 20006
Trying 192.168.248.133...
Connected to 192.168.248.133.
Escape character is '^]'.
<?�    *��v��|[����3�mM�O%�     jK�����v�Connection closed by foreign host.
[lcy@node1 ~]$ telnet 192.168.248.132 20006
Trying 192.168.248.132...
Connected to 192.168.248.132.
Escape character is '^]'.
��*0�^D�O��^��;�����/�6�ޑ  ��x0=�ArrE5Connection closed by foreign host.

[root@master edgemesh]# telnet 192.168.248.133 20006
Trying 192.168.248.133...
Connected to 192.168.248.133.
Escape character is '^]'.
�bc
2E�7eƯ����=<����}��$�'�;/���(�k1�9k�Connection closed by foreign host.
[root@master edgemesh]# telnet 192.168.248.132 20006
Trying 192.168.248.132...
Connected to 192.168.248.132.
Escape character is '^]'.

��8��l?'媋܏��  �' S�~1���r���z�?��3Connection closed by foreign host.
Poorunga commented 1 year ago

If the two nodes are in the same LAN, there is no need to configure the relayNodes

Poorunga commented 1 year ago

https://zhuanlan.zhihu.com/p/585749690, see question.12 and question.16

Vessel-arch commented 1 year ago

I reinstall the edgemesh with no relaynodes ,and do this tesy again. But still incorrect the edgemesh-agent journal in master

E0317 14:28:30.259034       1 tunnel.go:141] [MDNS] Failed to connect to {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/127.0.0.1/tcp/20006 /ip4/192.168.248.133/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]}, err: failed to dial 12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK:
  * [/ip4/192.168.248.133/tcp/20006] failed to negotiate security protocol: read tcp4 192.168.248.132:41694->192.168.248.133:20006: read: connection reset by peer
I0317 14:28:30.259134       1 tunnel.go:126] [MDNS] Discovery found peer: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/127.0.0.1/tcp/20006 /ip4/192.168.248.133/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]}
E0317 14:28:30.259247       1 tunnel.go:141] [MDNS] Failed to connect to {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/127.0.0.1/tcp/20006 /ip4/192.168.248.133/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]}, err: failed to dial 12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK:
  * [/ip4/192.168.248.133/tcp/20006] dial backoff
I0317 14:28:30.369183       1 shared_informer.go:247] Caches are synced for endpoints config 
I0317 14:28:30.369261       1 shared_informer.go:247] Caches are synced for service config 
I0317 14:28:30.371009       1 shared_informer.go:247] Caches are synced for loadBalancer destinationRule 
I0317 14:30:40.185116       1 loadbalancer.go:717] Dial legacy network between coredns-7f6cbbb7b8-k2v69 - {udp master 10.244.219.90:53}
I0317 14:30:40.257853       1 tunnel.go:264] Could not find peer node1 in cache, auto generate peer info: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: []}
E0317 14:30:40.508878       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from node1 error: new stream between node1: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: []} err: failed to find any peer in table"
E0317 14:30:41.011007       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from node1 error: new stream between node1: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: []} err: failed to find any peer in table"
E0317 14:30:42.011221       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from node1 error: new stream between node1: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: []} err: failed to find any peer in table"
E0317 14:30:44.011656       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from node1 error: new stream between node1: {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: []} err: failed to find any peer in table"
E0317 14:30:44.011682       1 proxysocket.go:98] "Failed to connect to balancer" err="failed to connect to an endpoint"

In node1

I0317 14:28:30.170234       1 tunnel.go:126] [MDNS] Discovery found peer: {12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd: [/ip4/127.0.0.1/tcp/20006 /ip4/192.168.248.132/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/10.244.219.64/tcp/20006 /ip4/169.254.96.16/tcp/20006]}
E0317 14:28:35.172606       1 tunnel.go:141] [MDNS] Failed to connect to {12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd: [/ip4/127.0.0.1/tcp/20006 /ip4/192.168.248.132/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/10.244.219.64/tcp/20006 /ip4/169.254.96.16/tcp/20006]}, err: failed to dial 12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd:
  * [/ip4/192.168.248.132/tcp/20006] failed to negotiate security protocol: context deadline exceeded
  * [/ip4/10.244.219.64/tcp/20006] dial tcp4 0.0.0.0:20006->10.244.219.64:20006: i/o timeout

我按照专栏的办法进行了问题十二的排查错误,首先我两个虚拟机的防火墙全部都是关闭的状态,我用netsta命令检测了两个虚拟机的20006号端口都是处于监听状态,他们也处于同一个LAN,也具备内网IP,唯一我不知道如何确定的就是网络是否放通了UDP的传输。我想知道这种情况下,如果再设置一个中继节点,会不会和上面情况一样,还是失败的呢?

Poorunga commented 1 year ago

从日志看来,这两个节点处于同一个局域网,已经通过MDNS互相发现了:

[MDNS] Discovery found peer: {12D3KooWNRVRy1v8Lqb5nGYsVZnyDj5x6q8dsPA8eLofzaGPH9Yd: [/ip4/127.0.0.1/tcp/20006 /ip4/192.168.248.132/tcp/20006 /ip4/192.168.122.1/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/10.244.219.64/tcp/20006 /ip4/169.254.96.16/tcp/20006]}

但是却没法连接成功:

  • [/ip4/192.168.248.132/tcp/20006] failed to negotiate security protocol: context deadline exceeded

如果192.168.248.132:20006端口确定可以访问,而且防火墙没有禁用端口,请检查每个节点上的edgemesh-agent的docker镜像是否相同(使用docker image查看image id),PSK是否相同(docker exec cat /etc/edgemesh/psk)。

Vessel-arch commented 1 year ago

谢谢您的提醒,我经排查后发现是由于我的master节点和node1的psk不同,可能是由于我多次卸载安装,让虚拟机返回到原先快照的缘故,我两边都彻底清理了环境后,再次安装edgemesh并确保二者的psk都相同,此时已经能正确输出结果了 ···· [root@master edgemesh]# kubectl exec -it alpine-test -- sh / # curl hostname-svc:12345 hostname-edge-84cb45ccf4-zwzz2

····