lonng / nano

Lightweight, facility, high performance golang based game server framework
MIT License
2.88k stars 445 forks source link

cluster: maitain heartbeart between nodes in the same cluster #91

Closed bbdshow closed 1 year ago

bbdshow commented 1 year ago

What problem does this PR solve?

1.Master与Members之间通过RPC调动建立心跳。让Master有主动踢掉异常节点的能力。 2.通过节点上报心跳信息,可以让Master重启或更新后,第一时间知道注册信息,其他节点无需通过重启再次注册。 3.最终保持 集群健康,和解耦 节点间的发布顺序与关联。方便多样式部署。 4.hook一个 OnUnregister fn,当节点异常做一些处理,比如报警。

What is changed and how it works?

1.通过在 Register proto + IsHeartbeat 字段,走不同的注册逻辑。记录心跳时间,每次 心跳注册 时检查一下所有节点的上报时间。 2.普通member定时向master调用心跳注册。

关于测试: 启动 1 master 2Gate 3 chat,
1chat可以开一个 mock exception exit 的方法。此时 master 会有主动 剔除异常 chat 的行为,具体可看日志。 2.当master重启,模拟master更新操作, 其他节点不用重启, master 的集群信息会自动恢复

lonng commented 1 year ago

@bbdshow Thanks for your awesome contribution.

There are some suggestions to make this PR better.

  1. It's better to define a Heartbeat interface in MasterService instead of reusing Register interface to make the semantic clear.
  2. The OnUnregister callback name should rename to UnregisterCallback to make its name clear.
  3. I think it's better to spawn a dedicated goroutine to check member heartbeat expiration instead of in every heartbeat request handler.

Again, thanks very much for your pull request to make nano better.

bbdshow commented 1 year ago

@lonng Thank you very much for your suggestion. I have modified it and submitted it

lonng commented 1 year ago

@bbdshow Another three commented left, please address them and the rest looks good to me.