Low priority: this PR does not address a problem, but resolves a mistake in the code, which incidentally does not lead to an issue in the current implementation. The PR will become obsolete in subsequent (planned) implementations of lean helix.
Found a peculiar thing in election trigger code. don't think this causes any issues as it is. but the implementation is very confusing:
when we trigger elections - the worker thread checks if the height and view associated with the trigger are still the current ones. if not - the trigger is considered stale and is ignored.
However, the height, view and elections handler pointer used by the worker thread are not the values associated with the trigger. they are taken under lock by the worker thread at the time the election trigger is being executed.
The only reason this does not cause elections to trigger prematurely immediately after block commits as a race condition is that the worker loop performs an additional check before invoking the elections function.
This however is still an incorrect implementation as the parameters passed to TermInCommittee.moveToNextLeaderByElection() are not the correct values, and may cause bugs if we move from one worker thread model. in fact, the pointer to TermInCommittee.moveToNextLeaderByElection() itself is may also be replaced in this race condition.
Low priority: this PR does not address a problem, but resolves a mistake in the code, which incidentally does not lead to an issue in the current implementation. The PR will become obsolete in subsequent (planned) implementations of lean helix.
Found a peculiar thing in election trigger code. don't think this causes any issues as it is. but the implementation is very confusing:
when we trigger elections - the worker thread checks if the height and view associated with the trigger are still the current ones. if not - the trigger is considered stale and is ignored.
However, the height, view and elections handler pointer used by the worker thread are not the values associated with the trigger. they are taken under lock by the worker thread at the time the election trigger is being executed.
The only reason this does not cause elections to trigger prematurely immediately after block commits as a race condition is that the worker loop performs an additional check before invoking the elections function.
This however is still an incorrect implementation as the parameters passed to
TermInCommittee.moveToNextLeaderByElection()
are not the correct values, and may cause bugs if we move from one worker thread model. in fact, the pointer toTermInCommittee.moveToNextLeaderByElection()
itself is may also be replaced in this race condition.This PR offers a fix.