slime-io / slime

An intelligent ServiceMesh manager based on Istio
https://slime-io.github.io/
Other
424 stars 78 forks source link

meshreg: fix repeat trigger zk reconnect #443

Closed believening closed 11 months ago

believening commented 11 months ago

The reconnect logic is written in both the meshregistry and go-zk and runs in different goroutines.

The go-zk reconnect relies on meshregsitry calling zk.Conn.Close to terminate the reconnect, but there is no synchronization mechanism between the two.

If go-zk do reconnect before the Close call, it will cause the meshregsitry to be triggered to reconnect again after the Close call, which in turn will cause the zk.Conn created by the previous meshregsitry reconnect to be closed and triggered to reconnect again and again.

We've made two improvements for this:

  1. Skip triggering meshregistry reconnect if go-zk reconnects successfully before triggering meshregistry reconnect.
  2. Since meshregistry reconnect creates a new zk.Conn, we require that each zk.Conn can only trigger a meshregistry reconnect once.