ssvlabs / ssv

Secret-Shared-Validator(SSV) for ethereum staking
https://ssv.network
GNU General Public License v3.0
181 stars 94 forks source link

rethink committee share management #1558

Open moshe-blox opened 1 month ago

moshe-blox commented 1 month ago

There are most likely race conditions in how we remove shares from Committee without locks, and in how we stop said committee if no shares are left while concurrently we could be adding a new share in the other goroutine which fetches metadata and discovers a newly activated validator?

Also, does the share map hold all shares or only active shares? If all shares then we likely dont stop Committee when there are 0 active but >0 inactive validators

We should rethink this entire subsystem and likely rewrite parts or all of it.

y0sher commented 2 weeks ago

A committee is a set of 4/7/13 operators. The Committee struct hold all the Shares, a share is a part of a validator private key that is managed by this committee. so for each validator there are 4 shares in the committee, 1 for each operator. so usually from the operator perspective, there's one share that belongs to you and you use it to send your messages and eventaully sign ethereum duties (block proposals/attestations). and you know all the rest of the shares in order to verify that the operators that are sending messages.

If a validator is removed from SSV, we want to remove his shares from the committee, there could be up to 500 validators in each committee. If all shares (all validators) are removed from this committee, we want to remove that committee.

Instances of the Committee struct are stored in a map that is called validatormap under the validatorController ( a struct and file that manages most of the business logic of the operator, incl starting and stopping validators and executing their duties using consensus.)

After all shares are removed we want to remove the instance of Committee from the validatormap.

So the actual issue is we protect with mutex the shares inside Committee so we lock when a share is added or removed (Committee.AddShare, Committee.RemoveShare), but we are not interlocking the committee removal from the validatormap. So we need to somehow increase the scope of the lock so if AddShare is called before the removal of the committee from the validatormap, we end up not removing it. we need to somehow lock these actions together.