nucypher / ferveo

An implementation of a DKG protocol forked from Anoma
https://nucypher.github.io/ferveo/benchmarks/perf/tpke/index.html
GNU General Public License v3.0
4 stars 10 forks source link

Share replacement procedures & naming conventions, based on user/staker actions & application logic #43

Open arjunhassard opened 1 year ago

arjunhassard commented 1 year ago

Context A replacement key share – compatible with the persistent (whole) public key – will need to be generated in a variety of scenarios, most of which are unscheduled – i.e. 'external' share generation prompts, including staker actions (and inaction), end-user actions, and the execution of arbitrary application logic.

Relevant issues: https://github.com/nucypher/ferveo/pull/26 https://github.com/nucypher/ferveo/issues/70

Prompt categories

(1) Orderly staker unbonding This comes in four flavors: (a) Staker commences unbonding of entire stake in order to depart network (b) Staker commences unbonding of some number of T tokens which drops them below a minimum requirement for one or more of their current cohorts. This can include unbonding such that the remaining amount is lower than the global minimum stake size. (c) Staker bonds some number of T tokens that takes them ABOVE the maximum allowed in a given cohort (corner case). (d) Staker commences unbonding of some number of T tokens which does not disqualify them from participation in any of their current cohorts.

(2) Inactive or defunct* node Inactivity could be theoretically detected by: (a) a tBTCv2 slashing event (on-chain verifiable) (b) the absence of some predefined interactive check-in requirement – e.g. developer sets cohort compositional parameter to require a signature from all cohort members at least once every 3 months (on-chain verifiable). This is basically confirmActivity but customizable and potentially incurring a higher fee (adopters pay more), useful for cases where non-responsiveness is intolerable. (c) challenge protocol – by other members of cohort (off-chain verifiable)



For early versions of CBD, the inactivity check with the best cost-benefit is (a). Note that the motivation to split the operation of a tBTCv2 and CBD node between two separate Ethereum addresses to evade this double punishment could be counterbalanced by providing adopters with the option of filtering for addresses with both apps authorized, which would be advertised as providing superior availability (in theory). 

The simplest implementation of (a) would be maximally punitive – a node would be removed from a Cohort if there was a slashing event of any size in any other Threshold application to which they have authorized T. 

 *A node which has lost or had their key share corrupted could be considered inactive by the protocol and dealt the same punishment. 


(3) Application-level cohort refresh (This woul increased collusion-resistance & redundancy, which becomes more critical if DKG & Decryption cohorts are always the same in early versions of the CBD MVP. This can be driven by either: (a) A pre-specified schedule e.g. For all sharing patterns; once per 7 day cycle, a random 10% of the longest-serving nodes are recycled, with the exception of three 'hard-coded' known nodes. (b) A user action and/or business logic. e.g. if NFT sold for greater than 5 ETH, initiate share refreshing such that n - m + 2 cohort members are replaced. (c) A schedule based on action/logic. e.g. if over 100 separate Ethereum addresses request access (and sign to prove ownership), increase frequency of cohort refresh to weekly.

(4) Emergency/safety cohort refresh e.g. If some sub-threshold minority of operators attempt to bribe or coerce the remaining members, the honest threshold could 'vote' to prompt a cohort refresh and remove offending operators from the cohort. This prompt needs more thought, particularly with regard to malicious misuse of emergency refresh and the lack of provability of a collusion attempt.

Procedures

For prompts 1(a-c) and 2(a), one entirely new key share is required to onboard a new staker into the Cohort. This share is replacing an old share belonging to a previous member of the cohort – therefore that old share must no longer be valid in the context of decrypting the underlying data. This is procedurally simple.

Conversely, prompts (3) & (4) will sometimes require all the members of the cohort to be replaced – or at least a greater number than the threshold. However, you need at least a threshold of nodes to execute any kind of share generation/replacement. Therefore the maximum number of nodes that can be replaced in a single procedure, or execution of any share generation function, is n - m. And to maintain the original cohort size, the protocol must simultaneously enlist and assign new nodes to take the newly generated replacement shares.

Hence a 'total cohort' refresh could be achieved in three steps – using a 9-of-16 cohort as an example: 1) 9 of the 'original' nodes replace the other 7 original nodes, and 7 new nodes are onboarded. 2) 7 of the new nodes + 2 of the original nodes replace the other 7 original nodes and another 7 new nodes are onboarded. 3) Any 9 of the 14 new nodes replace the 2 remaining original nodes. Note that the new Cohort may end up with some of the same node addresses as before, unless this is disallowed by the protocol and/or application-level parameters, but they would all hold fresh key shares, pertaining to the same whole public key.

Note that a dishonest threshold of nodes can choose to dump the rest of the nodes out of a Cohort at any time with the execution of a single method. This is congruent with the Honest Threshold trust assumption.

Naming

For prompts 1(a-c) and 2(a), we might call the corresponding method ‘Share Replacement’ or 'Share Substitution'. The existing term, 'share recovery' is misleading as it sort of implies that you'd end up with the same exact key share, same node, or both. It also makes sense to me to reserve the word ‘refresh’ for broader/higher-level changes to the cohort composition – see below.

For prompts 3(a) and 3(b), the protocol could in theory individually and sequentially replace nodes in the cohort, provided that the composition recycling parameters are abided by. However, it is more efficient and safer to invoke a generate multiple new shares in one swoop. To distinguish this method from individual share replacement, we might call this 'Multi-share Replacement', 'Multi-share Refresh', or 'Cohort Refresh'. The latter is a weaker name because there will be plenty of scenarios where multiple shares are replaced at once without the entire cohort changing.

arjunhassard commented 1 year ago

The SDK, documentation and naming conventions must disambiguate between (1) Operator Replacement and (2) Share Recovery.

The former method underpins CBD's collusion mitigation and corresponding adopter-facing trust lever. This only works if the share held by the replaced operator is invalidated. There are at least three ways for this to occur:

a. The [share recovery] method is called and the operator auto-deletes the share (and is trusted to refrain from editing their client code). There is no way to prove this occurred. b. The [share refresh] method is called and the all member shares are invalidated and the Cohort enters a new 'epoch'. All shares belonging to the previous epoch are effectively revoked. In this case, the protocol would orchestrate the exclusion of operators that were specified to be replaced, by reassigning the new shares to certain existing operators and certain new operators. c. Some Proactive SS-driven variant of the [share recovery] method is called that simultaneously invalidates the old share. If this can produce a proof, it's probably the safest approach.

Note that the Operator Replacement approach we choose for the first few versions of CBD will constrain how frequently cohorts can replace members, which in turn constrains the external prompts summarized above.

For example, if (a) is the only viable choice, then it may be expedient to further align T subsidy allocation with long-term operator commitment to the network, to decelerate the (potential) expansion of collusion surface and buy time for superior approaches – like (c) – to be developed and upgraded to production.

Whereas if (b) emerges as the best choice, then SC execution costs may also severely constrain the frequency of operator replacement.