Open robertsami opened 1 year ago
Hi @robertsami ! I am Anuj Diwan, a Computer Science PhD student at UT Austin. I am part of a team along with @arjunrs1 (Arjun Somayazulu) and we're taking a graduate Distributed Systems course. For our course project, we are interested in contributing to Yugabyte. This issue is related to our course material. Could we work on this issue? Any pointers for us to get started would be appreciated as well.
Thanks and regards, Anuj.
@ajd12342 , Refer to Yugabyte contribution page for instructions - https://docs.yugabyte.com/preview/contribute/core-database/checklist/ Feel free to take up this issue. cc @Huqicheng in case you need references to this particular issue. If you get past the initial aspects, we can identify a couple more items in this area.
@Huqicheng Thanks, we will take this issue up. Please feel free to assign it to us.
Our initial idea is to add a field to the lock that logs the timestamp at which it was locked (which naturally gets reset after an unlock and lock). Then, whenever someone queries the state of the lock or tries to get the lock, we can compare the current timestamp to this timestamp in order to decide whether to log that it has been taking too long. Is this reasonable?
Hi @rthallamko3 could you assign this one to us if all looks good?
@ajd12342 Sounds good. Is it possible to implement a wrapper of the lock instead of adding the field directly to the current lock?
Jira Link: DB-6159
Description
We have
performing_update_mutex_
andpeer_lock_
inconsensus_peers.cc
. If these get held for too long raft heartbeats will go unserved and cause leadership loss and re-elections. We should explicitly log any time these locks are held too long to make it more clear that this is happening in cases where we see rapid/intermittent leadership lossWarning: Please confirm that this issue does not contain any sensitive information