Open iurii-ssv opened 3 days ago
Another note, profitable
block QBFT (call it 2nd QFBT, or QFBT 2) should probably have exactly 1 round (not 2 or even more) because if not - that would mean we are not as profitable as we could be.
Alternative solution (pushing the approach from above to its limits)
To take a step back to explain why do 2 QBFT consensuses, ideally when proposing Ethereum blocks we want to have these 2 properties:
1) "proposing operator" rotation (important for preventing any particular operator from conducting funny business, like selling his privilege "to convince validator cluster to propose certain block(s), or kinds of blocks" through a side-channel); note, this is different from QBFT Instance leader rotation that exists to ensure QBFT liveness only, "proposing operator" rotation relies on it kind of deriving this desired property
2) every operator wants to know whether he needs to sign profitable
block or backup
block to successfully finish the duty - this is why we do 2nd QBFT (on profitable
block in the algo described above) so that upon commit-quorum every operator knows what other operator(s) gonna do (honest operators, to be precise) and can broadcast the correct block of the 2 for BLS-signing
property 1) is nice to have, but perhaps can be somewhat relaxed if we want to get the most out of MEV; mentioning just for completeness (for further considerations we might have on it), stuff described below doesn't compromise on it
property 2) however is a binary thing (either we have it, or we don't) and not having it means operators will miss proposing Ethereum block if they couldn't correctly guess & agree on which of backup
/ profitable
to propose; but (again, to get the most out of MEV) we might consider an approach where we do QBFT 1 consensus on backup
block first (like in the approach above), but then we throw away backup
block, take a gamble and propose profitable
block every time without doing full QBFT 2 consensus on it, but rather just hoping that profitable
block will be spread out to enough operators for them to sign with their validator share - this seems to make sense to do for 2 reasons:
backup
block means he is likely gonna be online and well connected to other peers (and to beacon node) in the next ~5 seconds to follow, which means he is very likely to succeed with proposing profitable
block and hence we can optimistically/prematurely terminate 2nd QBFT right after proposal step (without doing prepare/commit confirmations) but much faster as the result(good for MEV); we can't say the same "liveness" heuristic is true for other operators - hence re-using the leader from QBFT 1 is preferable Td
(or sooner if we got an agreement on profitable
block sooner), this involves everyone broadcasting their partial signature + receiving everyone else's; this could fail due to p2p networking issues (and nothing else) - which happens to be the only reason for QBFT 2 proposal phase to fail; hence doing QBFT 2 proposal phase only + BLS-signing has roughly similar success chance as doing just BLS-signing, so we might as well just omit QBFT 2 prepare/commit phases altogetherwe don't really need backup
block at all (hence we throw it away) - what we really need is to "check networking conditions right before we are about to propose profitable
block" (and select the leader who is best suited to do that - the leader proven by QBFT 1 consensus).
This approach relies on a couple of hypothesis that need to be verified against real-world cluster data (preferably production data that maybe we can gather by implementing a "dry-run" version of this approach on prod cluster(s)) - even though this approach seems risker (compared to approach with backup
block there will be more missed blocks most likely) - it might yield higher expected reward to Ethereum validator(s) over time (because of MEV).
I'll post the findings from recent Discord discussion(s) here - so we can have it documented to revisit later (cause it seems important),
The problem
1) It seems relays are blocking any proposals past 4s mark of the current slot. Would it be possible to add a bypass mechanism after 4s, so that a non-MEV block is submitted? (instead of a complete miss)
2) Iurii mentioned our 1 and 2 rounds of proposal duty allocate 4s unevenly / sub-optimally. It might make sense to review this mechanism and timings.
3) another point that's not 100% clear to me is why we are starting proposer duty exactly at the start of targeted slot, I understand the currently written code works this way - but is there some fundamental limitation (perhaps DVT-related) to do it like that (answering myself: probably not) ?
from this article - https://www.blocknative.com/blog/anatomy-of-a-slot - it seems the start of targeted slot is the time where most blocks already get proposed (so they have enough time to spread through Ethereum network)
4s since slot start (call it "soft limit") also seems to be quite late/risky (otherwise the chart below would look different I think), we might want to limit it to 2.5s instead of example - and that will render round 2 useless/unnecessary btw
Potential solution(s)
There is trade-off between picking/broadcasting the proposed block sooner/later:
I'll record another round of thoughts I have here on these 1-3 findings from above, I think we can do something like this:
backup
block should be built early enough so we never really have any "delay" issues with it (doesn't matter how we get it, lets say we expect to get it at -4s seconds before target slot start time - call itTb
); we want to achieve SSV qbft consensus onbackup
block as soon as possible, but not sign on it (post-consensus phase) just yet because we don't want to double-sign (and get slashed)profitable
block (doesn't matter how we get it, lets say we expect to get it 1s after target slot start time - call itTp
); we want to achieve SSV qbft consensus onprofitable
block as soon as possible, but unlike forbackup
block there is a high likelihood we won't be able to do it in time (lets call this deadline timeTd
, that could be 2.5s after target slot start time for example); thisprofitable
block either will get decided upon by qbft or not before deadlineTd
Td
time passes and enters post-consensus phase forbackup
block (to sign it, and broadcast to Ethereum) or it reaches qbft consensus onprofitable
block beforeTd
and ditchesbackup
block altogether moving to post-consensus phaseprofitable
block) we want to gather some production data (perhaps we already have it) on how much time round 1, round 2, ... typically take - and from this data we can estimate the best qbft configuration (how many rounds we want to do in that short timespan, and what would the timeout for each round be)Will this work, or am I missing something ? If something like this could work, I guess it's not easy to implement straight away but we can slowly progress towards it. It seems like an important problem to solve. Regarding additional resource (cpu/mem/...) consumption, sure it will cost some - but I think we won't have to query Beacon node for 2 blocks at the same time much (if at all), and it seems like block proposals are quite rare to matter in that respect anyway ?
regarding external factors (beacon node, relay):
Tb
,Tp
andTd
to fit the most common scenarios (and maybe some Beacon node request-retries, if not already)Tp
andTd
values to get the best resultsare we talking about the slashable Ethereum offense known as double signing ?
I believe for block proposal (for attestations it's similar but somewhat different) it means - Ethereum can punish validator if he has signed 2 different blocks (headers) and both of these blocks were observed by somebody (I think it might be called Watch-tower or fisher maybe), and that somebody created a proof of that and sent it out to Ethereum nodes to verify
so, if that's how Ethereum slashing for block production works - in the approach I outlined above we actually never sign 2 different blocks, only 1 block will ever be signed (ofc we'll probably need to add/adjust some logic that SSV node only ever signs 1 block at post-consesus phase even though it might have 2 blocks at hand after finishing 2 qbft consensuses prior to that - thus, post-consensus quorum needed to reconstruct validator signature can only be reached for at most 1 of the 2 blocks every SSV node prepared for target slot)
and as for who/how signed block is submitted to Ethereum network, I believe there isn't any issues with broadcasting such block from multiple different Beacon nodes at the same time (or different times) - in fact it is probably better if we can do multiple such broadcasts because then this block will reach all Ethereum nodes sooner (for Ethereum validators to be able attest to it).