shutter-network / rolling-shutter

Rolling Shutter is an MEV protection system to be plugged into rollups.
https://twitter.com/project_shutter/
26 stars 7 forks source link

Fix tx pointer age issue #449

Open jannikluhn opened 1 month ago

jannikluhn commented 1 month ago

The tx pointer defines the point in the encrypted transaction queue from which on the next batch of transactions should be executed. Each keyper locally maintains its own value of the tx pointer. It is crucial that the keypers have consensus on the tx pointer as otherwise key generation cannot be successful.

The tx pointer is included in DecryptionKeys messages. Keypers update their local tx pointer value accordingly whenever they see a DecryptionKeys message. The idea is that successfully generating a decryption key requires consensus in the first place, so this value is easy to agree on.

When key generation for a slot fails once, no decryption keys message will be created, the keypers won't update their tx pointer, and as a result just try to do it again next slot. This is good.

If a keyper goes offline for a while, their tx pointer will be outdated. However this will automatically be healed once the next keys are generated, as then the keyper will update their value to the one from the newest keys message.

However, this fails if too many keypers are offline such that key generation has stopped working at all. That means that even if the keypers go online again, they won't have DecryptionKeys messages whose tx pointer they can take on. We need a way to recover from this.

Therefore, what we need is a value that everyone can agree on. The value we take is the current length of the transaction queue in the sequencer. This effectively means that we drop all undecrypted transactions which is not ideal, but at least the system will start working again.

To distinguish between the two cases, we define the tx pointer age as the number of slots since the tx pointer has been last updated. If the age is lower than a certain threshold, the keyper uses its local value, if it is greater, it uses the sequencer queue length.

Finally, the issue: We don't exclude slots in which no key is supposed to be generated (e.g. because their proposer is not registered) from the age. This means that almost all tx pointers will be considered outdated if the number of registered proposers is small, even if everything has been working smoothly. To work around this, we currently reset the age to zero in slots without a registered proposer (see https://github.com/shutter-network/rolling-shutter/blob/gnosis/rolling-shutter/keyperimpl/gnosis/newslot.go#L101-L106). However, this means that the pointer will effectively never be outdated, even if key generation fails.

One way to fix this is to somehow only count slots in which a key should have been generated when calculating the tx pointer age. This is not as easy as it sounds though, because we can't rely on knowing proposers for slots in the past due to consensus layer API restrictions, so calculating the age after a restart properly would be impossible. There might be other solutions.

The fix should not only be in the keyper implementation, but also in the spec.

jannikluhn commented 2 weeks ago

Suggested solution: Store the tx pointer age explicitly in the db. Set it to zero whenever a new key is generated/received. Increment it at every slot for whom a proposer is registered, do not change it at all other slots. Reset it to infinity on restarts (eg by deleting the db row).

Optionally, on restarts, we might want to wait with participating in key generation until we have seen a keys message and therefore updated our pointer, or until a timeout occurs.

konradkonrad commented 2 weeks ago

on restarts, we might want to wait with participating in key generation until we have seen a keys message and therefore updated our pointer, or until a timeout occurs

bootstrapping a new setup sounds like an issue with that approach

fredo commented 2 weeks ago

When we set the tx pointer age to infinity this means the keyper will automatically use the sequencer length until he has been convinced otherwise by successfully created keys, right?

jannikluhn commented 2 weeks ago

bootstrapping a new setup sounds like an issue with that approach

true

When we set the tx pointer age to infinity this means the keyper will automatically use the sequencer length until he has been convinced otherwise by successfully created keys, right?

yes