Cohort Refresh – research (PSS) & product implications + validation

arjunhassard commented 2 years ago

Discussion/Update Nov 30th

Dynamic Committees

In general members leaving is not a problem, except for the eventual deterioration of the cohort. The tricky part is member replacement.
Many Ferveo-compatible models are aiming for higher security than needed given the lowest common denominator of honest majority assumptions elsewhere
May be unavoidable to return to stateful nodes
- I.e. storing secrets for yourself and other cohort members
Onboarding a new node needs a new protocol beyond simple PSS
- Reconstruct whole polynomial for a new point
- Forces threshold to be more than half
- A failing subset becomes a problem not only for service delivery, but because at a certain size it also makes refresh unavailable
Cohort size is independent from cohort refresh, from a cryptographic point of view (not from a redundancy or service quality/optionality point of view)
- Performance cost grows quadratically
- Small cohorts + decentralized refresh may be sufficiently trustless
Current design based on BFT consensus, divides secrets into shares such that shares are weighed by stake on BFT layer

Maximizing collusion friction

The ease of orchestrating unlawful collusion turns on several factors outside cohort size, such as verifiable randomness in sampling
'Proactive cohort refresh' - is there also a reason to carve forward-compatibility with refresh by fiat of the original "summoner", while keeping the same public key (i.e. no need to download and encrypt again, which may be impossible for complex permissions)
Same user, new message, new ciphertext, new cohort for maximal security?

Availability

Liveness challenges, and related slashing, is, replete with air quotes, "very easy" as a means for Alice to take the temperature of a cohort
- Commitments to secret as part of a proof of life protocol
Option to choose only tBTC authorized stakers to piggyback availability
Detecting cohort deterioration is important regardless of slashing   Cohort deterioration
Legitimate exits from the network triggers cohort extension
Idea: prevent full unbonding until you found a replacement (but could just swap for a sybil with min. stake)
Easy to add to protocol as stakers must request deauthorization from apps
Two discrete types of cohort deterioration, and that we need to be cautious about conflating them: one is from ordinary unbinding, pursuant to the protocol, while the other is the more chaotic and bizarre unlawful disappearance
- there are differences insofar as slashing or another disincentive is needed to reduce incidence of the latter, and that this introduces complexity
- slashing aside, the cryptological machinery needed to replace a member of a cohort appears to be identical
- detection and incentive structure of unlawful disappearance (let's say "truancy"?) is a bigger problem that doesn't need to be solved in very near-term

Cost of cohort refresh

Prohibitive if other nodes keep coming and going (& don’t want to punish good behavior of longer-term commitment)
Reimbursement could be incorporated into fee rebate
- During unbonding period, subsidies dealt out to nodes that remain
- the cost for cohort members who remain in the cohort dependent on the exact transactions required by the protocol, and it might be possible to have an in-band refund for such members

Long-term commitment to network & overall node population

Kappa selected for long-term committers, given incredible macro uncertainty associated with maximum subsidies
Buried longevity incentive through cohort refresh reimbursement could be valuable
Long-term incentivization can be introduced uniquely to PRE & CBD

Statefulness baggage

Keyring, add stash of secret data for each ritual
Increases liability and security requirements
- Before, only long-term seed phrase
- Now backups required
Bonus security for stateful nodes: can forget old secrets if following protocol, which is also akin to forward secrecy

Collusion orchestration friction

Introduce more granular 'epochs' for key refreshing
- Disassemble and individually cohort-refresh a stream of messages, makes it harder to decrypt a larger set of ciphertexts through collusion
- Cost: public key discovery and management overhead
- This could also be achieved via interactive public key switching for sets of ciphertexts, depending on the use case
- A sort of cryptological (but not cryptographic) variant of forward secrecy

@jMyles thanks for your live-noting, I've included lots of your sentences here ^

cygnusv commented 2 years ago

Notes from @jMyles:

Justin Myles Holmes — Today at 3:42 PM I hear @dnunez point out that the scope directly in front of us with regard to cohort refresh is dynamic DKG participation, and that we are well-advised to avoid pitfalls that arise from broader approaches that fall outside that scope, such as handling very large corrupting events. @arj points out that our current "stateless" cohort approach is plausibly still viable for legacy use cases.

(Though I want to say, I think that it is confusing to call DKG participants "Ursula") @dnunez points out that, for cohort refresh that relies on creation of new "shares" through participation of the cohort (I'd call this style of refresh "covenant refresh"), the political design needs to be such that the portion which we rely upon for honesty (eg, the majority) must remain available for the refresh to even be possible.

(Perhaps to put it another way: a failing subset becomes a problem not only for service delivery, but because at a certain size it also makes refresh unavailable) I want to point out: participants bearing the original DKG ritual (what is popularly called "the ceremony") can presumably create additional valid frags if they come back online.

We may of course want to make this difficult or impossible, or we may want to make it easy, or we may want to make this a tuneable parameter. @arj points out that the ease of orchestrating unlawful collusion turns on several factors outside cohort size, such as verifiable randomness in sampling. "What prompts cohort refresh?" asks @arj.

Among the discussion he sees arising from this:

Of course the core impetus is unbinding.

But is there also a reason to carve forward-compatibility with refresh by fiat of the original "summoner"? ( annnnd @dnunez runs headlong into one of the two reasons that I think "shares" is an undesirable word for fragment material ) @dnunez points out that liveness challenge, and related slashing, is, replete with air quotes, "very easy" as a means for Alice to take the temperature of a cohort.

I'll point out that a malicious node may choose to respond properly to a challenge, but still pretend to be offline at service time. @dnunez also introduces the idea that challenge failure on-chain can itself prompt the transaction that requests cohort refresh. 💥 @arj has goosebumps, chills down his spine, and a call to run for the door His qualm: the prospect of an exodus appears to threaten the stability mechanism of cohort refresh, perhaps reminiscent of the FUD re: the Shanghai 32 eth staking protocol. @Caruso asks about the impact of "staker-by-staker" as a way of counting heads for the purpose of refresh. @arj points out that such a scheme might encourage Sybil'ing. @arj asks us to come to the point of this whole thing:

After all, is availability even a near-term concern with respect to rollout? We don't necessarily need to solve that to push this thing out the door. @dnunez responds by pointing out the terminal nature of cohort deterioration. @arj points out that there are two discrete types of cohort deterioration, and that we need to be cautious about conflating them: one is from ordinary unbinding, pursuant to the protocol, while the other is the more chaotic and bizarre unlawful disappearance. @arj asserts that the former is way easier than the latter.

(I'm not sure I agree.) @dnunez also asks what's so different.

@arj points out that there are differences insofar as slashing or another disincentive is needed to reduce incidence of the latter, and that this introduces complexity.

@dnunez points out that, slashing aside, the cryptological machinery needed to replace a member of a cohort appears to be identical.

@arj agrees, and clarifies that his thrust is to point out that the detection and incentive structure of unlawful disappearance (let's say "truancy"?) is a bigger problem that doesn't need to be solved in very near-term.

@dnunez agrees. (I know I'm oversimplifying on account of velocity) @derek points out that the specific Sybil attack in which one party successfully selects itself to replace its node is much less likely in an environment of robust random sampling.

@arj agrees, and goes further to point out that random beacon utilization in sampling has other benefits, known and unknown. @derek wonders about the relationship between cohort size and cost for refresh.

He carefully tip-toes into the notion that unbinders need to cover the cost of cohort refresh arising from their departure. @dnunez points out that the cost for cohort members who remain in the covenant (people whom @arj pointed out earlier are at risk of being punished for good behavior) is dependent on the exact transactions required by the protocol, and that it might be possible to have an in-band refund for such members. (long-form discussion about longevity incentives for individual nodes)

piotr — Today at 4:28 PM Some of the PSS / DPSS schemes (Ferveo, CHURP, DyCAPS) assume BFT security model and I think that simplifies the reasoning about their properties

Justin Myles Holmes — Today at 4:32 PM @dnunez wonders about action items stemming from this discussion.

@arj: what burdens does state maintenance introduce?

@dnunez: At a minimum, it means changing from mere custody of a secret to actually baking up the results of a runtime on a regular basis, etc. Justin Myles Holmes — Today at 4:39 PM @arj muses about whether various tunable parameters with respect to cohort refresh granularity can create sufficiently onerous incentives to make collusion sufficiently difficult to disincentivize it, without needing to add additional cryptographic safeguards. What @arj suggests is a sort of cryptological (but not cryptographic) variant of forward secrecy

arjunhassard commented 1 year ago

Envisaged (unvalidated) prompts for share recovery/refresh and initiation of DKG ceremony:

(1) Orderly/natural staker unbonding Sub-scenarios: (a) Staker unbonds entire stake and completely departs network (b) Staker unbonds some number of T tokens which does not affect any of the cohorts they are in (c) Staker unbonds some number of T tokens which drops them below a minimum requirement for one or more of their current cohorts

(2) Inactive node This could be theoretically detected by: (a) a tbtcv2 slashing event (on-chain verifiable) (b) absence of some predefined interactive check-in – e.g. developer sets cohort compositional parameter to require a signature from all cohort members at least once every 3 months* (on-chain verifiable) (c) challenge protocol – by other members of cohort (off-chain verifiable)

(3) Application-prompted cohort refresh for increased collusion-resistance & redundancy (also to mitigate against the reduced redundancy/diversity of DKG & Decryption cohorts being the same) This can be driven by either: (a) A preset schedule – e.g. every month, 20% of the longest-serving nodes are recycled, except for these 'known' 5 nodes. (b) Driven by user action and/or business logic – e.g. if NFT sold for greater than 5 ETH, initiate series of DKG ceremonies such that entire cohort has been replaced within 1 month.

(4) Emergency/safety e.g. if some sub-threshold minority of operators attempt to bribe or coerce the remaining members, the honest threshold could 'vote' to prompt a cohort refresh and remove offending operators from the cohort (needs more thought, particularly with regard to malicious misuse of emergency refresh and the lack of provability of collusion attempt)

Corner cases: (a) Staker bonds some number of T tokens that takes them ABOVE the maximum allowed in a given cohort

*confirmActivity but customizable and potentially incurring a higher fee, useful for cases where downside of non-responsiveness is high

cygnusv commented 1 year ago

Relevant comment in https://github.com/nucypher/ferveo/pull/26#discussion_r1066825928

nucypher / ferveo

Cohort Refresh – research (PSS) & product implications + validation #70