yfinkelstein / node-zookeeper

node.js client for Apache Zookeeper
MIT License
479 stars 111 forks source link

How can I listen to the connection state changes events? #351

Closed fstonezst closed 1 week ago

fstonezst commented 3 weeks ago

Users of the LeaderSelector must pay attention to any connection state changes. If an instance becomes the leader, it should respond to notification of being SUSPENDED or LOST. If the SUSPENDED state is reported, the instance must assume that it might no longer be the leader until it receives a RECONNECTED state. If the LOST state is reported, the instance is no longer the leader and its leader task should exit.

DavidVujic commented 3 weeks ago

If I understand your question correctly, it sounds like the concept of leader election. This is supported by the ZooKeeper client. The ZooKeeper server is an event driven system, and clients can register listeners.

There's some examples of this thing in the examples folder. It was quite some time ago I wrote this, but I tried to do something in the style of the contents in this ZooKeeper book.

fstonezst commented 3 weeks ago

If I understand your question correctly, it sounds like the concept of leader election. This is supported by the ZooKeeper client. The ZooKeeper server is an event driven system, and clients can register listeners.

There's some examples of this thing in the examples folder. It was quite some time ago I wrote this, but I tried to do something in the style of the contents in this ZooKeeper book.

Thank you for your response! Actually, I was just looking to understand how to use this SDK to listen for changes in the client’s connection status with ZooKeeper. Specifically, I want to know how to detect when a client disconnects from the server and when the session becomes invalid. Any guidance on this would be appreciated!

DavidVujic commented 3 weeks ago

Here's examples on the events fired for the connection status of a client: https://github.com/yfinkelstein/node-zookeeper/blob/master/examples/wrapper.js#L36

In the example, there's also a suggestion on how to handle disconnects (the connecting event) with retries.

fstonezst commented 3 weeks ago

Here's examples on the events fired for the connection status of a client: https://github.com/yfinkelstein/node-zookeeper/blob/master/examples/wrapper.js#L36

In the example, there's also a suggestion on how to handle disconnects (the connecting event) with retries.

Thank you for the example. It seems to focus on error handling when the client initially fails to connect to Zookeeper. However, I'm interested in listening for events when the client successfully connects to Zookeeper but then gets disconnected due to network issues or other reasons. Could you provide guidance on how to handle those disconnection events specifically?

DavidVujic commented 3 weeks ago

You can try the client out by running a local ZooKeeper (I would recommend running it in docker).

Running ZooKeeper server:

docker run --rm -p 2181:2181 zookeeper

Running your Node.js client, in a separate terminal:

# here I am using the existing examples: creating nodes, electing a leader and such.
node examples/index.js

Test the connection, by starting/stopping the server:

docker restart <the-container-id-here>

And monitor how the client behaves. It is the connect, connecting and close events.

Recording: zookeeper-connection-test

fstonezst commented 3 weeks ago

You can try the client out by running a local ZooKeeper (I would recommend running it in docker).

Running ZooKeeper server:

docker run --rm -p 2181:2181 zookeeper

Running your Node.js client, in a separate terminal:

# here I am using the existing examples: creating nodes, electing a leader and such.
node examples/index.js

Test the connection, by starting/stopping the server:

docker restart <the-container-id-here>

And monitor how the client behaves. It is the connect, connecting and close events.

Recording: zookeeper-connection-test zookeeper-connection-test

Thank you very much for providing the detailed testing process. I simulated a client disconnect by simply disconnecting my computer from the network. However, I noticed that no events were triggered when the network was disconnected.

Is there a way to listen for events specifically when the network disconnects?

Thanks again for your support!

DavidVujic commented 3 weeks ago

I think the connecting event should be triggered actually. Were you running a client towards a ZooKeeper Server in a network (not locally)?

fstonezst commented 3 weeks ago

I think the connecting event should be triggered actually. Were you running a client towards a ZooKeeper Server in a network (not locally)?

Yes, I was connecting to a remote ZooKeeper server. In fact, when I lost the network connection, the connecting event did not get triggered. I think you could replicate this phenomenon by stopping your ZooKeeper service.

DavidVujic commented 3 weeks ago

Interesting! In the recording, the very last thing done is stopping the ZooKeeper server process and the client ends up in a connecting state until giving up (the example code exits the Node.js process).

How do you connect, are you using the example code? Would you mind share how you do?

DavidVujic commented 3 weeks ago

@fstonezst I got a notification about a new message from you, but it seems to have been removed?

fstonezst commented 3 weeks ago

Interesting! In the recording, the very last thing done is stopping the ZooKeeper server process and the client ends up in a connecting state until giving up (the example code exits the Node.js process).

How do you connect, are you using the example code? Would you mind share how you do?

OK,here’s a summary of my operation process, which I’ve captured in a GIF below: mnggiflab-compressed-mnggiflab-compressed-mnggiflab-from-video-to-gif-2024_08_21_09_54_13

DavidVujic commented 3 weeks ago

Thank you for sharing!

It looks like the Node process exists after a period of being offline from the network ("operation timeout"). I think this is because of a network error, and not a connection error to the server. I think this make sense, because this would indicate that something is wrong with the network access of the running process (and the process would exit).

It is possible to set the timeout (ms) in the values to the constructor, and I think this will make the client stay alive for a longer period of time and wait until the timeout expires.

fstonezst commented 3 weeks ago

Thank you for sharing!

It looks like the Node process exists after a period of being offline from the network ("operation timeout"). I think this is because of a network error, and not a connection error to the server. I think this make sense, because this would indicate that something is wrong with the network access of the running process (and the process would exit).

It is possible to set the timeout (ms) in the values to the constructor, and I think this will make the client stay alive for a longer period of time and wait until the timeout expires.

Thank you for your reply! If the connection event is not triggered in this case, is there any other way to detect when the client disconnects from the server? I would like the client to immediately stop acting as the leader when it disconnects from the server.

DavidVujic commented 3 weeks ago

It's a bit difficult to see, but when I look at your recording again it looks like that you stopped the process, right? Looks like a "session connecting" just before.

It is the ZooKeeper server, not the client, that elects leader. If your client is offline, with no access to the Internet, I don't think there is much the client can do. Or do I misunderstand you?

fstonezst commented 3 weeks ago

It's a bit difficult to see, but when I look at your recording again it looks like that you stopped the process, right? Looks like a "session connecting" just before.

It is the ZooKeeper server, not the client, that elects leader. If your client is offline, with no access to the Internet, I don't think there is much the client can do. Or do I misunderstand you?

I didn't kill the process. Here are the steps I followed:

  1. run node examples/index.js
  2. Turned off WiFi
  3. Turned WiFi back on

And I want to use ZooKeeper to handle leader election for my custom service cluster, where only one instance in the cluster can become the leader to perform leadership tasks.

DavidVujic commented 3 weeks ago

If you have more than one client connected, if the leader is disconnected the other registered worker would be the leader. This is directed by the ZooKeeper server. Can you give that a try?

DavidVujic commented 3 weeks ago

The nodes you register can be of type ephemeral, that would mean if disconnected they would be removed (and lose the leader position).

fstonezst commented 3 weeks ago

If you have more than one client connected, if the leader is disconnected the other registered worker would be the leader. This is directed by the ZooKeeper server. Can you give that a try?

That's certainly true. However, the issue arises when the leader client loses its connection with the server. If the leader does not receive any events indicating a disconnection, it may not realize that it has lost its role as the leader. In this case, the other followers in the cluster will initiate an election to choose a new leader. As a result, there could be a situation where two leaders exist in the cluster simultaneously. I hope this clarifies the situation for you.

DavidVujic commented 3 weeks ago

I think the ephemeral feature of the ZooKeeper server would help you out here. Otherwise, how would you expect an offline client to un-register a leader if it doesn't have internet connection?

fstonezst commented 3 weeks ago

I think the ephemeral feature of the ZooKeeper server would help you out here. Otherwise, how would you expect an offline client to un-register a leader if it doesn't have internet connection?

Thank you for your suggestion! I am already using ephemeral znodes. I hope the summary on error handling from the wiki that I provided can help clarify my issue. I really appreciate your support!

Here’s the link: Error Handling

DavidVujic commented 3 weeks ago

Here's another recording: the ZooKeeper Server is the bigger terminal window to the left. To the right, I start two clients, both trying to register a leader by creating an ephemeral node.

One of them is the leader, and when I kill that process the ZooKeeper Server reacts to it and sets a new leader. There should not be possible to have two leaders as I understand ZooKeeper.

leader-election