Add a primitive to implement session tracker

ghost commented 10 years ago

I'd like to be able to use javazab to implement session tracker. Here is how ZooKeeper implements its session tracker:

Client sends connect_request. The request gets forwarded to the leader. The leader assigns session ID and session timeout, commits the session information, and responds to the request.
Once the client receives the response, it starts sending heartbeat periodically to the connected server, which might not be the leader. The server stores the heartbeat information in memory.
The leader periodically sends out ping to followers. The follower piggybacks the client heartbeat information in the ping response.
The leader uses this information to decide whether to expire client sessions.
If the client gets disconnected from the server, it tries to connect to another server and revalidate the session.

the key here is that only session creation/revalidation/expiration touches the disk, but heartbeating doesn't.

I think the simplest way to implement this in javazab would be to allow the leader to short-circuit from the preprocess() method by returning null or something. The application (session tracker) uses the send() method to forward the client session information to the leader. The leader can update the session information in the preprocess method and return null to indicate to javazab that the message doesn't need to be broadcasted.

fpj commented 10 years ago

It sounds like this is abusing a bit of the zab layer because the pings aren't really supposed to be broadcast, but you're proposing that we use this layer to also let followers communicate with the leader. However, with an independent zab layer, we could think of decentralizing the ping processing and simply broadcast them. In this case, the session management is fully replicated.

In the case you want to keep the session management centralized, I think we need to either extend the javazab api to allow processes to send regular messages to the leader, or talk directly to the application process running on the leader.

ghost commented 10 years ago

I haven't thought about this much, but it would be awesome if we can decentralize the session management.

fpj commented 10 years ago

Assuming that sessions can move from one server to another, we need the servers to agree on what happens to sessions. In the case of a system like ZK, we need to track the ephemeral znodes associated to the session in the case the session expires.

For session management, we need to make sure that creating/closing a session is agreed upon. However, just that isn't sufficient because we may have a quorum say recording that a session has been closed, and a client could have its session now with a server not in that quorum. I'm thinking that this is actually ok because the session close will be ordered with respect to all other update operations, and the leader can error out requests from sessions that have been closed already by tracking active session. The state of active sessions must be replicated, which we have if we broadcast the operations to create and close sessions.

Does it make sense?

hongchaodeng commented 10 years ago

@fpj

How does followers know the following things associated with a session:

Watches
ephemeral znodes
metadata, e.g. timeout

Is it committed through a quorum? Or does the follower get information from the leader?

fpj commented 10 years ago

Ok, to be fair, I don't know exactly why Michi proposed this, so making the assumption that we are trying to recreate ZK, we have the following:

Watches are local to each server and don't need to be propagated. When a server delivers an update for a given znode, it checks if there is a watch pending from a client currently connected
Ephemeral znodes are replicated like any other znode. In each server, we need to additionally make sure to add such znodes to the corresponding session when the server delivers a create.
The timeout can be negotiated and exchanged when the client connects. If a client reconnects, then it can pass it again.

Some of these things may depend on what you want to do with it.

ghost commented 10 years ago

I was thinking what it takes to implement something like ZooKeeper's ephemeral nodes to keep track of liveness of processes using javazab.

If we decentralize the session management, who decides when the session is expired? For example, if a server is not in the quorum, it can't expire the session even if the client stops sending heartbeat. I guess the session tracker needs to track the session owner for each session and expire the sessions if the clients don't revalidate them within a certain time period after the session owner fell off the quorum.

ghost commented 10 years ago

Just to elaborate my thought a bit more, I'm thinking of writing another reference server, which is a simple HTTP-based group membership tracker.

It supports multiple groups. For example, you can have "database" group and "webserver" group. Each group optionally have metadata associated with it.
Each group has zero or more members. To be a member of a group, you send a PUT request. Each member can optionally have metadata associated with it.
Each group is versioned. Whenever something changes (new member joined, a member left, metadata changed), the version gets incremented.
To maintain the membership, you need to periodically send heartbeat to tracker. Heartbeat request can be either a HEAD or a GET request. Heartbeat response contain the version information in the header so that the client knows when there is a change in the group.
The first member in the list can consider itself to be a leader if it wants to. For example, webserver might not need to elect a leader, but the database group might need to designate one node to be a primary.

Here is a sample workflow:

// join the database group. the group gets created if it doesn't exist already.
// say the default session timeout is 10 seconds.
$ curl -XPUT tracker1.example.com/database/db1.example.com -d '{"opaque": "metadata"}'
response header : x-group-version: 0
{
  "endpoints": [
    tracker1.example.com,
    tracker2.example.com,
    tracker3.example.com
  ],
  "members": [
    { "db1.example.com" : {"opaque": "metadata"} }
  ]
}

// send a heartbeat using HEAD
curl -XHEAD tracker1.example.com/database/db1.example.com
response header : x-group-version: 0

// another member joins with 20 second session timeout without any metadata.
$ curl -XPUT tracker1.example.com/database/db2.example.com?timeout=20
response header : x-group-version: 1
{
  "endpoints": [
    tracker1.example.com,
    tracker2.example.com,
    tracker3.example.com
  ],
  "members": [
    { "db1.example.com" : {"opaque": "metadata"} },
    { "db2.example.com" : null }
  ]
}

// db1 finds out the group has been modified in the next heartbeat.
curl -XHEAD tracker1.example.com/database/db1.example.com
response header : x-group-version: 1

// use GET heartbeat to get the new group configuration.
$ curl -XGET tracker1.example.com/database/db1.example.com
response header : x-group-version: 1
{
  "endpoints": [
    tracker1.example.com,
    tracker2.example.com,
    tracker3.example.com
  ],
  "members": [
    { "db1.example.com" : {"opaque": "metadata"} },
    { "db2.example.com" : null }
  ]
}

// db2 stops sending heartbeat. after 20 seconds, the tracker deletes
// db2 from the member list and bump up the version. db1 finds out
// the group change in the next heartbeat.
curl -XHEAD tracker1.example.com/database/db1.example.com
response header : x-group-version: 2

It would be more scalable if we can decentralize heartbeat management. Unlike ZooKeeper, the client doesn't get notified immediately when a group changes.

hongchaodeng commented 10 years ago

@m1ch1 How does it interact with zab servers?

hongchaodeng commented 10 years ago

My idea is:

every server tells leader what client sessions it maintains. This is true because it needs to create a session through a quorum.
zab servers should have knowledge of cluster membership. This would be helpful for doing dynamic membership (drop peers that lost for a long time and form a smaller ensemble). This is true because leader knows.
if a server is lost, the leader just deletes all sessions it created through a quorum.

I didn't see why the follower needed to inform leader of client heartbeat. The server connected to the client is a delegate in such case.

What do you say? @m1ch1 @fpj

ghost commented 10 years ago

every server tells leader what client sessions it maintains. This is true because it needs to create a session through a quorum.

Yes. I'm thinking that when a session gets created, the server that received the request becomes the owner of the session.

zab servers should have knowledge of cluster membership. This would be helpful for doing dynamic membership (drop peers that lost for a long time and form a smaller ensemble). This is true because leader knows.

Yes. Javazab should expose the cluster membership so that the group membership tracker can tell the clients the list of endpoints.

if a server is lost, the leader just deletes all sessions it created through a quorum.

I think we should let the client reconnect to another server to revalidate the session like ZooKeeper does. The leader should expire 'orphaned' sessions only if they don't get revalidated within a timeout.

ghost commented 10 years ago

On minor change: I think it makes sense to make the metadata persistent.

// join the database group. the group gets created if it doesn't exist already.
// say the default session timeout is 10 seconds.
$ curl -XPUT tracker1.example.com/database/db1.example.com
response header : x-group-version: 0
{
  "config": null,
  "endpoints": [
    tracker1.example.com,
    tracker2.example.com,
    tracker3.example.com
  ],
  "active_members": [
    { "db1.example.com" : null }
  ],
  "inactive_members": [
  ]
}

// configure metadata for db1 and db2. you can configure a member before it joins.
curl -XPUT tracker1.example.com/database/db1.example.com/config "db1 config"
curl -XPUT tracker1.example.com/database/db2.example.com/config "db2 config"

$ curl -XGET tracker1.example.com/database/db1.example.com
response header : x-group-version: 2
{
  "config" null,
  "endpoints": [
    tracker1.example.com,
    tracker2.example.com,
    tracker3.example.com
  ],
  "active_members": [
    { "db1.example.com" : "db1 config" }
  ],
  "inactive_members": [
    { "db2.example.com" : "db2 config" }
  ]
}

// add group-wide configuration
curl -XPUT tracker1.example.com/database/config "group config"
response header : x-group-version: 3
{
  "config": "group config",
  "endpoints": [
    tracker1.example.com,
    tracker2.example.com,
    tracker3.example.com
  ],
  "active_members": [
    { "db1.example.com" : "db1 config" }
  ],
  "inactive_members": [
    { "db2.example.com" : "db2 config" }
  ]
}

ghost commented 10 years ago

So I think the conclusion here is that we can implement session tracker without having to forward heartbeat information to the leader. I started working on this, and I was pleasantly surprised how easy it is to use javazab :)

https://github.com/ZK-1931/pulsed

zk1931 / jzab

Add a primitive to implement session tracker #88