wind-c / comqtt

A lightweight, high-performance go mqtt server(v3.0|v3.1.1|v5.0) supporting distributed cluster
MIT License
877 stars 50 forks source link

关于跨机器建立集群问题,两台外网机器,进行跨机器建立集群,已经joined,就提示没有Suspect co-001 has failed, no acks received #9

Closed skygp closed 1 year ago

skygp commented 1 year ago


其中一台 conf.yml:

cluster: node-name: co-009 #The name of this node. This must be unique in the cluster.If nodename is not set, use the local hostname. bind-port: 1886 #The port is used for both UDP and TCP gossip.Used for member discovery and communication. members: localhost:1886,, #seeds member list, format such as, queue-depth: 1024000 #size of Memberlist's internal channel which handles UDP messages. raft-port: 1887 raft-dir: ./raft/node1


cluster: node-name: co-001 #The name of this node. This must be unique in the cluster.If nodename is not set, use the local hostname. bind-port: 1886 #The port is used for both UDP and TCP gossip.Used for member discovery and communication. members: localhost:1886,, #seeds member list, format such as, queue-depth: 1024000 #size of Memberlist's internal channel which handles UDP messages. raft-port: 1887 raft-dir: ./raft/node1



A node has joined: co-009 A node has joined: co-001 Local member gogo BootstrapRaft 2022-10-11T10:20:51.812+0800 [INFO] raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:co-009 Address:}]" 2022-10-11T10:20:51.812+0800 [INFO] raft: entering follower state: follower="Node at [Follower]" leader-address= leader-id= Cluster Node Created! Mqtt Server Started!
2022-10-11T10:20:53.139+0800 [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id= 2022-10-11T10:20:53.140+0800 [INFO] raft: entering candidate state: node="Node at [Candidate]" term=7 2022-10-11T10:20:53.146+0800 [INFO] raft: election won: tally=1 2022-10-11T10:20:53.146+0800 [INFO] raft: entering leader state: leader="Node at [Leader]" 2022/10/11 10:20:53 [INFO] memberlist: Suspect co-001 has failed, no acks received 2022/10/11 10:20:56 [INFO] memberlist: Suspect co-001 has failed, no acks received 2022/10/11 10:20:57 [INFO] memberlist: Marking co-001 as failed, suspect timeout reached (0 peer confirmations) A node has left: co-001 2022/10/11 10:21:00 [INFO] memberlist: Suspect co-001 has failed, no acks received

有加入joined,就是没有回应,不知道是不是配置文件错了, 看了是内部代码报的错,如果是我的conf.yml配置错了,希望作者给个正确conf.yml配置方式,感谢

skygp commented 1 year ago


wind-c commented 1 year ago


skygp commented 1 year ago


skygp commented 1 year ago


skygp commented 1 year ago

2022/10/13 11:03:38 worker exits from a panic: runtime error: invalid memory address or nil pointer dereference 2022/10/13 11:03:38 worker exits from panic: goroutine 42 [running]:*goWorker).run.func1.1() /usr/local/bin/pkg/mod/ +0x10c

这个panic主要是这里报的错,用了nil执行方法,需要调BootstrapRaft()之后才能执行以下代码 for i := 0; i < gps; i++ { c.inPool.Submit(c.processInboundMsg) }

wind-c commented 1 year ago

你拉最新代码,我昨天加了集群参数bind-addr,这个参数可以设定为内网ip,不能用localhost,members中也用内网ip。集群节点间内网ip通信。我在云上用三台Linux centos机器跑测正常。参考配置:1.jpg2.jpg3.jpg

skygp commented 1 year ago

谢谢大佬,我还没测出跨集群效果,单集群效果ok的,请教一个问题,你连接redis有没有出现Error: Connection reset by peer?因为跨集群只用一个redis服务,我从节点连主redis服务一直报 Connection reset by peer这个错误。。。搞了好久。感觉就差这一步了。