openconfig / gribi

A gRPC Interface to a Network Element RIB.
Apache License 2.0
57 stars 14 forks source link

Clarification regarding changes proposed for handling election ID #33

Closed VijayKestur closed 17 hours ago

VijayKestur commented 2 years ago

Context is the below pull request:

Since election ID is the only field in AFTOperation that identifies that master client has sent the AFTOperation, how do we handle the situation if existing client which had the highest election ID lost it master role. Should that client's AFTOperation be failed even though the election ID sent in the AFToperation is equal to the last known election ID?

xw-g commented 2 years ago

Hi @VijayKestur,

Per the election_id description, the election happens externally to the gRIBI server. gRIBI server enforces the election result. So, we expect the new primary increases the election_id monotonically and also update the election_id in a ModifyRequest request after the primary role changed. Does this answer your question?

BTW, is your question about #32 ? If so, maybe we can move the discussion there for better tracking? Thanks.

VijayKestur commented 2 years ago

Hi Xiao, Yes this question is about the pull request #32 so it will be good to track it with that pull request. Sorry it does not answer my question. What I meant is, as per the pull request #32 gRIBI server will need to make a client the master even if the election ID is the same as the previous highest. If that is the case then two clients will have same election ID, which will be teh highest election ID. And when the clients with same election IDs send AFTOperation, that contains the previously highest election ID then there will be a contention. Also if election ID is monotonically increasing then should not the new client have election ID higher than the previously highest election ID ? Thanks

robshakir commented 2 years ago

Hi @VijayKestur,

The reason that we made this change is that there is no guarantee that the election ID changes when there is a new ID provided, consider the case where the client reconnects. In this case, the primary client is still the same primary client -- and the external election result does not change (since this primary did not itself become invalid as the primary). When this client reconnects, it'll return the same ID as it previously had provided, and hence 'resume' being primary.

With the behaviour that was previously specified, the previous connection remains the primary and blocks the new connection from the same client being used.

The 'risk' of this is that in a client that is not the reconnecting client can come along and effectively take over as a primary with the current ID, however, this reflects a bug in the external election since some new primary was elected without changing the election result. The reconnection case was considered more critical to handle cleanly.

Thanks, r.