Closed shekohex closed 1 year ago
SPEC looks good. I think we can get rid of the SigningManager
trait and put those functions directly on the DKGSigningManager
struct. Here is another way of organizing the code:
struct OfflineStageManager {
signers: HashSet<DKGSigners>,
}
impl OfflineStageManager {
// new, generate_offline_stages, pick_next_completed_offline_stage
}
struct MessagesToBeSigned {
messages_to_be_signed: VecDeque<UnsignedProposal>
}
impl MessagesToBeSigned {
// enqueue_message, dequeue_message
}
struct DKGSigningManager {
offline_stage_manager: OfflineStageManager,
messages_to_be_signed: MessagesToBeSigned
}
impl DKGSigningManager {
// sign, reset, run
}
@shekohex let me know what you think.
Yes, these are good suggestions. I will edit the spec with them. The use of a trait here is just the beginning, as we soon will start refactoring the worker to be into smaller trait system that works together and we can have different implementation of each part that we can swap in or out to test different strategies.
Linking closed PR for provenance: https://github.com/webb-tools/dkg-substrate/pull/403
/// Goal: given n peers and a threshold of t, ensure at least t+1 peers are able to completely sign a message
///
/// Challenge: Peers can be unreliable or even malicious. Networking can be unreliable. Despite these challenges,
/// we want to ensure that the protocol is able to complete in a timely manner.
///
/// Currently, for each "signing round", we generate a list of "signing sets" which includes many permutations of peers from the best of the best authorities.
/// Once one completes, it gets submitted to the blockchain. So long as one signing set completes, the signing round is considered
/// complete. This is the naive approach, and suffers from high data volume and redundancy across the network.
///
/// New proposal: For each "signing round", we begin with the first "signing trial". A "signing trial" is defined as
/// an attempt at achieving at least one complete signing set. Each "signing trial" has a value, n_sets, which starts at 2.
/// The value of n_sets determines the number of permutations of peers that will be generated for the signing set in this
/// signing trial. For example, if n_sets = 2, then we will generate 2 signing sets, each with t+1 peers. If this trial fails,
/// the next trial doubles the n_sets to 4, and so on until we either complete a signing set or reach a maximum number of
/// signing set permutations (which is equal to nC(t+1))
///
/// This approach helps reduce the number of messages sent across the network, and also reduces the degree of redundancy. There
/// is a high chance that the first or second signing trial will succeed, and if it does, we can stop there. If it does not,
/// we can continue to increase the number of signing sets until we reach a maximum number of signing sets. This maximum number
/// of signing sets is determined by the number of peers in the network.
pub struct SigningManagerV2<P> {
trial_id: usize,
n: usize,
t: usize,
_pd: PhantomData<P>
}
impl<P: Get<u32> + Clone> SigningManagerV2<P> {
pub fn new(n: usize, t: usize) -> Self {
Self { n, t, trial_id: 0, _pd: Default::default() }
}
}
pub struct SigningTrial<T> {
pub sets: Vec<SigningSet>,
pub trial_id: usize,
// what's going to be signed
pub message: T
}
pub trait TrialBasedSigningManager {
// The message we are going to sign
type Message: Clone + Debug + Encode + Decode + Send + Sync + 'static;
// Generates the trial
fn next_trial(&mut self) -> Option<SigningTrial<Self::Message>>;
}
impl<P: Get<u32> + Clone> TrialBasedSigningManager for SigningManagerV2<P> {
type Message = UnsignedProposal<P>;
fn next_trial(&mut self) -> Option<SigningTrial> {
let n_sets = 2.pow(self.trial_id + 1);
let mut sets = Vec::with_capacity(n_sets);
/**
* Generate the signing sets
*
*/
self.trial_id += 1;
// ...
}
}
(continuing speccing here)
Assumptions
In general, we should make the assumption that there will always be a pool of nodes available that will allow us to completely sign one signing set. This allows us to assume that the TrialBasedSigningManager
(TBSM) will always be able to succeed, even if it takes multiple trials.
Ensuring TBSM drives signing stage to success
In order for the TBSM to always succeed, we will need to be able to synchronize the state of the TBSM across the blockchain to allow DKG nodes to synchronistically participate in starting signing rounds. We can add the following definitions and functions to the DKG pallet that are (approximately):
#[derive(Encode, Decode, Clone, Debug)]
pub enum SigningManagerState {
// only set when we haven't begun signing yet
NeverStarted,
// set right before signing is about to begin. New nodes joining will be able to participate
AboutToBegin { session_id: u64, trial_round: u32 },
// set while the signing is still in progress. New nodes joining won't be able to participate. Active nodes
// are expected to retrieve the state after each signing trial, only ending the TBSM if the state is Resolved.
Active { session_id: u64, trial_round: u32 },
// set after the signing has finished
Resolved { session_id: u64, signed_proposal: Proposal }
}
#[derive(Encode, Decode, Clone, Debug)]
/// The DKG clients send this to the OCSM
Enum DKGSigningUpdate {
Failed { session_id: u64, trial_round: u32 }
}
/// DKG clients query this to find the state
fn signing_manager_state() -> Option<SigningManagerState>;
/// When a signing set has failed, the DKG clients store the update offchain and have it submitted on-chain
fn contribute_observation(DKGSigningUpdate);
The OnChainSigningManager
The chain will have to track the state of the DKG clients. To do this, once the keygen stage is detected as complete by the blockchain, the blockchain triggers an update via the OnChainSigningManager
(OCSM), which is really just abstraction in the DKG pallet. The OCSM's duty is to keep track of the state, allowing DKG clients to asynchronously attempt to participate in signing, regardless if they just joined or not. Yet, the only time DKG clients can truly begin to participate in signing is when the state is AboutToBegin
. Once clients join, they are expected to participate in the TBSM locally
Generating the signing sets
We can generate the signing sets similar to the way we currently do. With the new method, we will not (directly) need a blockchain to ramify the data since the permutations will be deterministic and dependent on the set of best authorities. Instead of spawning nC(t+1) permutations of signing sets, we will instead begin the first signing trial with 2 sets. These 2 sets are considered to be the most likely to succeed. If k determines the signing trial index (starting with 0, where we begin with 2 signing sets), then the number of permutations at signing trial f(k) = 2^(k+1)
.
Driving forward the TBSM
After keygen stage, the pallet will need code to broadcast an update event to the DKG clients such that the DKG clients receive a SigningManagerState::AboutToBegin
update. Then, the DKG clients will generate the signing sets, and spawn the corresponding signing protocols for each. If one of the signing protocols finishes, we will store the signed proposal offchain where it will later be submitted on-chain (similar to how we handle the public key). Once on-chain, the OCSM will update its state and mark itself as SigningManagerState::Resolved
. Then, it will emit an event to all DKG clients.
In the case a signing trial fails (meaning all signing sets failed to resolve to completion), then, the DKG clients will each submit a DKGSigningUpdate::Error
to the OCSM. The OCSM will then be responsible for collating the results and determining when the begin the next trial round (once all signing sets have failed). Once the next trial round is needed, the OCSM emits an event to the DKG clients, beginning the next trial round, and the process repeats.
Just some Note about having the State on-chain:
It will not be easy nor fast to update the state, you will be limited by how fast the Blockchain is, which usually 6s per block + finality (which could take +12s). Why not using the gossiping to synchronize the state?
And wouldn't it possible to prioritize the people on the set that they have a direct connection with each other?
The Gossip Engine knows which peers we are directly connected to, and weather they are on-chain authority or not, plus you can map between KeyginPartyId
to AuthorityId
to PeerId
and the other way around.
So you could even make use of that to put parties near each other together in the singing set.
If we choose to use the gossiping engine instead for sharing a state, one thing we could do is ensure that each connected peer gets a "greeter" packet that shares the current state of the signing. Also, whenever the gossip engine sends out a packet, we can append to it data that contains information about the state of the signing. The issue with this approach is that, if a malicious client wants to spoof this packet, then it could throw the state off for everyone. This is why using a blockchain could be helpful here. However, if we decide to use the gossiping engine, we could have a weighting algorithm for updates where the best authorities that share their "proposed next signing manager state" are the most influential in setting the signing state for all peers.
on each finalized block fN
we do the following:
pk
unsignedProposals
unsingedProposal
do the following:
s
where s
is keccak256(pk, fN, keccak256(unsingedProposal))
t+1
signing set from this RNG
Overview
The current Implementation of the Signing Protocol is not the best implementation we can have, it runs the offline stage and then the voting stage after that every time we see a new proposal. Which is not fast, or not as fast as we need. Also, this implementation does couple the unsigned message to the Offline stage, which is not really necessary at that point, it is only required in the voting stage or the last stage of the signing.
Research
A Better design here is that we will have a
SigningManager
that holds a list of a pre-computed, ephemeralCompletedOfflineStage
which we will use later on to sign any message (in our case, it is anUnsignedProposal
).At the start, the
SigningManager
will be in its initial state, in that state, it will generate different sets of signers, prepare any other state required for creating many offline stages, and return theSigningManager
instance. callinggenerate_offline_stages(n)
will start many tasks in the background where each task will start creatingn / k
offline stages, wherek
is the number of signing sets we created previously; so that we end up withn
number ofCompletedOfflineStage
.Once we have that list, we can simply use them to sign any message/proposal which will be much faster.
Notes
Each new session (TODO: link to the new session spec here) we will reset the
SigningManager
to its initial state again, and start the process of generating the offline stages again.Examples
Questions/Issues