Open mrabino1 opened 4 years ago
As a follow-on... if we only follow a defined sequence as outlined above and repeat it.. over enough time, that sequence could also be tracked and "broken". Thus, in a similar way we have a OTP that creates 6 (or X) digits based on the hash of the current time and the secret, we need to include a constraint of max #... but otherwise, the objective is that sequence of which replica is allowed to talk should OTP hash randomized such that following the replica sequence is unpredictable. and by the time a node has validated, etc.. the would-be attacker has no idea which other node in the world would speak next. So even if over a period of a month (or longer depending on how many validators a user was using and how many replicas existed), even if all of the IP addresses of the nodes were determined.. because they keep quasi-randomly changing who responds.. an attacker wont have the resources to attack all of them all of the time.
following on
TOTP has already been battle tested. We already have a shared keystore so we can compute the sha256 of that keystore for uniqueness. That sha256 digest is the TOTP secret
TOTP generates a 6 digit code that changes every X seconds.
once we know how many clones / replica validator nodes we will have (and each one has a constant number), we should have to do the following to that TOTP code
round(({TOTP}/1000000)*{number of replicas))
this will uniquely determine which node / clone / replica should talk at any given time.
Prysm proposal
Token hash defense
This is really for advanced node operators that have more than 5 validators running on a node
In this proposal we have four fundamental attributes to add to prysm: max number of replicas replica_id time between node rotation collision gap between node rotation
Abstract: Given an attack vector to inhibit finality being the DDOS to overwhelm the known and announced validators as the node IP can be determined after a sufficient amount of time, a defense strategy to rotate validators amongst different nodes using the same validator keys has been developed. An important aspect is to acknowledge that the validator wallet is identical on all nodes and all nodes and validators are listening while only one talks at a time. So a node might have 30 validators of 32 ETH each, but all of them on one node move at the same time.
The observed DDOS attack vector was able to predict the IP address of the upcoming validator and cause finality breakage because the IP address predictable. Thus the defense is to have a sufficient distribution of nodes such that DDOS is multiple factors more expensive and harder to predict.
This "who's turn is it to talk" (and not get slashed requires precise synchronization across nodes that do not have communication between them.
To accomplish this, a modified and inspired version of a token ring technology and TOTP authentication has been developed.
By each node having the same wallet, they naturally share a common secret (the sha256 digest of the wallet). By definition this secret is automatically shared between those that use the same wallet. Next, each validator needs to know how many twins it has out there in the world along with which # each validator is.
For example, there might be 10 nodes using the exact same keys (which is the wonderful side effect of allowing near 100% redundancy), thus, Laptop id #1 VPS #2 and so on
Important again that each node instance only use the ID once.
Another important operational aspect is that it is essential to shut all the validators down (in all node locations) before adding new validators or twins, and then spool them back up when ready.
Since node rotation is driven by time, time synchronization is essential. Of course ETH2 already relies on time sync and roughtime thus this issue is largely mitigated.
Similar to the math associated to the lottery, if there are three nodes with the exact same wallet, the sequence is:
1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1
Which of the 6 above options is determined based on the sha256 digest of the wallet (the secret).
From statistics, we have: 3 replicas - 6 possible sequences 4 replicas - 24 possible sequences 5 replicas - 120 possible sequences 6 replicas - 720 possible sequences
Now comes in the third attribute, the time duration between node rotation. One the validator has adjusted its time w roughtime , we can assume the nodes are all in sync with time (if not ETH2 wouldn't work anyway). Now we have a defense point in time (01 Jan 1970).
The default time between node rotation is 10 minutes (600 seconds). To ensure no propagation collisions, the final attribute of node collision gap is added (with a default of 3s). Thus it is recommended to have the validator drop all validator requests for the first and last Xs during its "turn to talk". Depending on how long the time between node rotation, one can predict the node loss of efficiency. (with 3s / 600s being 99.5% which should be acceptable given the benefits).
The above table with 3 nodes will clearly become exponentially larger the mode nodes you have. That said, the more nodes, the more robust the redundancy and harder it is to attack the "right" validator as predicting the sequence becomes a challenge, especially if the time between node rotation is tight (but you sacrifice node efficiency (which can be "tuned" with the gap timing) .
There is virtually no consequence for selecting too many node replicas when fewer exist however the opposite is not true as it would cause some slashing.
The above is currently a patch to the given architecture and should be integrated deeper into the protocol and not only at the client. Further, the underlying validator (not only the node) could also migrate around independent of the node, but that can be addressed in the future.