discussion: ways to improve Ethereum block proposer duty flow

I'll post the findings from recent Discord discussion(s) here - so we can have it documented to revisit later (cause it seems important),

The problem

1) It seems relays are blocking any proposals past 4s mark of the current slot. Would it be possible to add a bypass mechanism after 4s, so that a non-MEV block is submitted? (instead of a complete miss)

2) Iurii mentioned our 1 and 2 rounds of proposal duty allocate 4s unevenly / sub-optimally. It might make sense to review this mechanism and timings.

3) another point that's not 100% clear to me is why we are starting proposer duty exactly at the start of targeted slot, I understand the currently written code works this way - but is there some fundamental limitation (perhaps DVT-related) to do it like that (answering myself: probably not) ?

from this article - https://www.blocknative.com/blog/anatomy-of-a-slot - it seems the start of targeted slot is the time where most blocks already get proposed (so they have enough time to spread through Ethereum network)

4s since slot start (call it "soft limit") also seems to be quite late/risky (otherwise the chart below would look different I think), we might want to limit it to 2.5s instead of example - and that will render round 2 useless/unnecessary btw

Before t = 0 there are safe proposals that ensure propagation happens and so you don't miss out on the slot. There are also a lot of actors that wait until very late, 2.5 seconds in, who are taking their chances with propagation delays. So that's the general distribution of block availability over time relative to that slot boundary. You can call it the t = 0 or t=12.

Potential solution(s)

There is trade-off between picking/broadcasting the proposed block sooner/later:

we do want to propose a block built at the latest possible moment (eg. it becomes available 1s after slot start), call it profitable block
yet we can't risk to wait too long (especially because DVT adds extra time to do its thing, compared to single-node Ethereum setup), so we want to have a backup block

I'll record another round of thoughts I have here on these 1-3 findings from above, I think we can do something like this:

lets forget for a second about how hard it is to implement in code, and how much additional resources (cpu/memory/bandwidth) it will consume just to get to theoretical best solution; and then we can improve upon it further, compromising and such
backup block should be built early enough so we never really have any "delay" issues with it (doesn't matter how we get it, lets say we expect to get it at -4s seconds before target slot start time - call it Tb); we want to achieve SSV qbft consensus on backup block as soon as possible, but not sign on it (post-consensus phase) just yet because we don't want to double-sign (and get slashed)
then comes a time to build a profitable block (doesn't matter how we get it, lets say we expect to get it 1s after target slot start time - call it Tp); we want to achieve SSV qbft consensus on profitable block as soon as possible, but unlike for backup block there is a high likelihood we won't be able to do it in time (lets call this deadline time Td, that could be 2.5s after target slot start time for example); this profitable block either will get decided upon by qbft or not before deadline Td
so each SSV node either waits util Td time passes and enters post-consensus phase for backup block (to sign it, and broadcast to Ethereum) or it reaches qbft consensus on profitable block before Td and ditches backup block altogether moving to post-consensus phase
additionally, to get the most out of the time block proposer duty runner has (especially for profitable block) we want to gather some production data (perhaps we already have it) on how much time round 1, round 2, ... typically take - and from this data we can estimate the best qbft configuration (how many rounds we want to do in that short timespan, and what would the timeout for each round be)

Will this work, or am I missing something ? If something like this could work, I guess it's not easy to implement straight away but we can slowly progress towards it. It seems like an important problem to solve. Regarding additional resource (cpu/mem/...) consumption, sure it will cost some - but I think we won't have to query Beacon node for 2 blocks at the same time much (if at all), and it seems like block proposals are quite rare to matter in that respect anyway ?

regarding external factors (beacon node, relay):

we have no gurantees on how much time we will spend building a block (because we request it from external source that could delay forever); I guess we just need to pick appropriate values for Tb, Tp and Td to fit the most common scenarios (and maybe some Beacon node request-retries, if not already)
even if SSV nodes do their job on time we might get unexpected delay(s) from beacon node(s) when broadcasting signed block (although it helps that multiple SSV nodes are broadcasting it - 1 or 2 failed beacon nodes might be fine); we can adjust Tp and Td values to get the best results

@Iurii I think what might be a problem is that an operator doesn't know if others successfully submitted. If they have then it will lead to slashing. And I think there's no reliable way to find this out unless we add another consensus layer on whether they submitted

are we talking about the slashable Ethereum offense known as double signing ?

I believe for block proposal (for attestations it's similar but somewhat different) it means - Ethereum can punish validator if he has signed 2 different blocks (headers) and both of these blocks were observed by somebody (I think it might be called Watch-tower or fisher maybe), and that somebody created a proof of that and sent it out to Ethereum nodes to verify

so, if that's how Ethereum slashing for block production works - in the approach I outlined above we actually never sign 2 different blocks, only 1 block will ever be signed (ofc we'll probably need to add/adjust some logic that SSV node only ever signs 1 block at post-consesus phase even though it might have 2 blocks at hand after finishing 2 qbft consensuses prior to that - thus, post-consensus quorum needed to reconstruct validator signature can only be reached for at most 1 of the 2 blocks every SSV node prepared for target slot)

and as for who/how signed block is submitted to Ethereum network, I believe there isn't any issues with broadcasting such block from multiple different Beacon nodes at the same time (or different times) - in fact it is probably better if we can do multiple such broadcasts because then this block will reach all Ethereum nodes sooner (for Ethereum validators to be able attest to it).

Another note, profitable block QBFT (call it 2nd QFBT, or QFBT 2) should probably have exactly 1 round (not 2 or even more) because if not - that would mean we are not as profitable as we could be.

Alternative solution (pushing the approach from above to its limits)

To take a step back to explain why do 2 QBFT consensuses, ideally when proposing Ethereum blocks we want to have these 2 properties: 1) "proposing operator" rotation (important for preventing any particular operator from conducting funny business, like selling his privilege "to convince validator cluster to propose certain block(s), or kinds of blocks" through a side-channel); note, this is different from QBFT Instance leader rotation that exists to ensure QBFT liveness only, "proposing operator" rotation relies on it kind of deriving this desired property 2) every operator wants to know whether he needs to sign profitable block or backup block to successfully finish the duty - this is why we do 2nd QBFT (on profitable block in the algo described above) so that upon commit-quorum every operator knows what other operator(s) gonna do (honest operators, to be precise) and can broadcast the correct block of the 2 for BLS-signing

property 1) is nice to have, but perhaps can be somewhat relaxed if we want to get the most out of MEV; mentioning just for completeness (for further considerations we might have on it), stuff described below doesn't compromise on it

property 2) however is a binary thing (either we have it, or we don't) and not having it means operators will miss proposing Ethereum block if they couldn't correctly guess & agree on which of backup / profitable to propose; but (again, to get the most out of MEV) we might consider an approach where we do QBFT 1 consensus on backup block first (like in the approach above), but then we throw away backup block, take a gamble and propose profitable block every time without doing full QBFT 2 consensus on it, but rather just hoping that profitable block will be spread out to enough operators for them to sign with their validator share - this seems to make sense to do for 2 reasons:

if we can arrange it this way - it would be best that resulting leader of QBFT 1 (which can have several rounds) will be chosen to also lead round 1 of QBFT 2 (which is the only round) - this is because this same leader getting QBFT consensus on backup block means he is likely gonna be online and well connected to other peers (and to beacon node) in the next ~5 seconds to follow, which means he is very likely to succeed with proposing profitable block and hence we can optimistically/prematurely terminate 2nd QBFT right after proposal step (without doing prepare/commit confirmations) but much faster as the result(good for MEV); we can't say the same "liveness" heuristic is true for other operators - hence re-using the leader from QBFT 1 is preferable
we do the final BLS-signing to reconstruct validator signature at time Td (or sooner if we got an agreement on profitable block sooner), this involves everyone broadcasting their partial signature + receiving everyone else's; this could fail due to p2p networking issues (and nothing else) - which happens to be the only reason for QBFT 2 proposal phase to fail; hence doing QBFT 2 proposal phase only + BLS-signing has roughly similar success chance as doing just BLS-signing, so we might as well just omit QBFT 2 prepare/commit phases altogether

we don't really need backup block at all (hence we throw it away) - what we really need is to "check networking conditions right before we are about to propose profitable block" (and select the leader who is best suited to do that - the leader proven by QBFT 1 consensus).

This approach relies on a couple of hypothesis that need to be verified against real-world cluster data (preferably production data that maybe we can gather by implementing a "dry-run" version of this approach on prod cluster(s)) - even though this approach seems risker (compared to approach with backup block there will be more missed blocks most likely) - it might yield higher expected reward to Ethereum validator(s) over time (because of MEV).

ssvlabs / ssv

discussion: ways to improve Ethereum block proposer duty flow #1829