Closed ghost closed 10 years ago
It sounds like this is abusing a bit of the zab layer because the pings aren't really supposed to be broadcast, but you're proposing that we use this layer to also let followers communicate with the leader. However, with an independent zab layer, we could think of decentralizing the ping processing and simply broadcast them. In this case, the session management is fully replicated.
In the case you want to keep the session management centralized, I think we need to either extend the javazab api to allow processes to send regular messages to the leader, or talk directly to the application process running on the leader.
I haven't thought about this much, but it would be awesome if we can decentralize the session management.
Assuming that sessions can move from one server to another, we need the servers to agree on what happens to sessions. In the case of a system like ZK, we need to track the ephemeral znodes associated to the session in the case the session expires.
For session management, we need to make sure that creating/closing a session is agreed upon. However, just that isn't sufficient because we may have a quorum say recording that a session has been closed, and a client could have its session now with a server not in that quorum. I'm thinking that this is actually ok because the session close will be ordered with respect to all other update operations, and the leader can error out requests from sessions that have been closed already by tracking active session. The state of active sessions must be replicated, which we have if we broadcast the operations to create and close sessions.
Does it make sense?
@fpj
How does followers know the following things associated with a session:
Is it committed through a quorum? Or does the follower get information from the leader?
Ok, to be fair, I don't know exactly why Michi proposed this, so making the assumption that we are trying to recreate ZK, we have the following:
Some of these things may depend on what you want to do with it.
I was thinking what it takes to implement something like ZooKeeper's ephemeral nodes to keep track of liveness of processes using javazab.
If we decentralize the session management, who decides when the session is expired? For example, if a server is not in the quorum, it can't expire the session even if the client stops sending heartbeat. I guess the session tracker needs to track the session owner for each session and expire the sessions if the clients don't revalidate them within a certain time period after the session owner fell off the quorum.
Just to elaborate my thought a bit more, I'm thinking of writing another reference server, which is a simple HTTP-based group membership tracker.
Here is a sample workflow:
// join the database group. the group gets created if it doesn't exist already.
// say the default session timeout is 10 seconds.
$ curl -XPUT tracker1.example.com/database/db1.example.com -d '{"opaque": "metadata"}'
response header : x-group-version: 0
{
"endpoints": [
tracker1.example.com,
tracker2.example.com,
tracker3.example.com
],
"members": [
{ "db1.example.com" : {"opaque": "metadata"} }
]
}
// send a heartbeat using HEAD
curl -XHEAD tracker1.example.com/database/db1.example.com
response header : x-group-version: 0
// another member joins with 20 second session timeout without any metadata.
$ curl -XPUT tracker1.example.com/database/db2.example.com?timeout=20
response header : x-group-version: 1
{
"endpoints": [
tracker1.example.com,
tracker2.example.com,
tracker3.example.com
],
"members": [
{ "db1.example.com" : {"opaque": "metadata"} },
{ "db2.example.com" : null }
]
}
// db1 finds out the group has been modified in the next heartbeat.
curl -XHEAD tracker1.example.com/database/db1.example.com
response header : x-group-version: 1
// use GET heartbeat to get the new group configuration.
$ curl -XGET tracker1.example.com/database/db1.example.com
response header : x-group-version: 1
{
"endpoints": [
tracker1.example.com,
tracker2.example.com,
tracker3.example.com
],
"members": [
{ "db1.example.com" : {"opaque": "metadata"} },
{ "db2.example.com" : null }
]
}
// db2 stops sending heartbeat. after 20 seconds, the tracker deletes
// db2 from the member list and bump up the version. db1 finds out
// the group change in the next heartbeat.
curl -XHEAD tracker1.example.com/database/db1.example.com
response header : x-group-version: 2
It would be more scalable if we can decentralize heartbeat management. Unlike ZooKeeper, the client doesn't get notified immediately when a group changes.
@m1ch1 How does it interact with zab servers?
My idea is:
I didn't see why the follower needed to inform leader of client heartbeat. The server connected to the client is a delegate in such case.
What do you say? @m1ch1 @fpj
every server tells leader what client sessions it maintains. This is true because it needs to create a session through a quorum.
Yes. I'm thinking that when a session gets created, the server that received the request becomes the owner of the session.
zab servers should have knowledge of cluster membership. This would be helpful for doing dynamic membership (drop peers that lost for a long time and form a smaller ensemble). This is true because leader knows.
Yes. Javazab should expose the cluster membership so that the group membership tracker can tell the clients the list of endpoints.
if a server is lost, the leader just deletes all sessions it created through a quorum.
I think we should let the client reconnect to another server to revalidate the session like ZooKeeper does. The leader should expire 'orphaned' sessions only if they don't get revalidated within a timeout.
On minor change: I think it makes sense to make the metadata persistent.
// join the database group. the group gets created if it doesn't exist already.
// say the default session timeout is 10 seconds.
$ curl -XPUT tracker1.example.com/database/db1.example.com
response header : x-group-version: 0
{
"config": null,
"endpoints": [
tracker1.example.com,
tracker2.example.com,
tracker3.example.com
],
"active_members": [
{ "db1.example.com" : null }
],
"inactive_members": [
]
}
// configure metadata for db1 and db2. you can configure a member before it joins.
curl -XPUT tracker1.example.com/database/db1.example.com/config "db1 config"
curl -XPUT tracker1.example.com/database/db2.example.com/config "db2 config"
$ curl -XGET tracker1.example.com/database/db1.example.com
response header : x-group-version: 2
{
"config" null,
"endpoints": [
tracker1.example.com,
tracker2.example.com,
tracker3.example.com
],
"active_members": [
{ "db1.example.com" : "db1 config" }
],
"inactive_members": [
{ "db2.example.com" : "db2 config" }
]
}
// add group-wide configuration
curl -XPUT tracker1.example.com/database/config "group config"
response header : x-group-version: 3
{
"config": "group config",
"endpoints": [
tracker1.example.com,
tracker2.example.com,
tracker3.example.com
],
"active_members": [
{ "db1.example.com" : "db1 config" }
],
"inactive_members": [
{ "db2.example.com" : "db2 config" }
]
}
So I think the conclusion here is that we can implement session tracker without having to forward heartbeat information to the leader. I started working on this, and I was pleasantly surprised how easy it is to use javazab :)
I'd like to be able to use javazab to implement session tracker. Here is how ZooKeeper implements its session tracker:
the key here is that only session creation/revalidation/expiration touches the disk, but heartbeating doesn't.
I think the simplest way to implement this in javazab would be to allow the leader to short-circuit from the preprocess() method by returning null or something. The application (session tracker) uses the send() method to forward the client session information to the leader. The leader can update the session information in the preprocess method and return null to indicate to javazab that the message doesn't need to be broadcasted.