potuz commented 3 years ago

🐞 Bug Report

The issue

Prysm does not process early attestations in the current slot, but rather in the next slot.

Description of the problem

Consider the following timeline

Beacon node receives an attestation A for the current block, 1 second into the slot.
Beacon node does not have the current block, so the attestation is put into a queue.
Beacon node receives the block 3 seconds into the slot and processes it by 3.2 seconds.
Beacon node receives an attestation B for the current block at 4.2 seconds and since it has the block it processes it immediately.

Then the late attestation B is processed and gossiped, but the attestation A is not processed until the queue is cleared at the beginning of the next slot.

Detailed description of the problem:

Blocks in prysm are handled by onBlock, this function processes the block and the attestations within it, updates the head, the justification and finalized points, and performs fork-choice operations related to the new block we just synced. But it does not call to clear attestation queues for this received slot.

The way attestations are processed is by running a routine that has the following loop in it

       st := slots.NewSlotTicker(s.genesisTime, params.BeaconConfig().SecondsPerSlot)
    for {
        select {
        case <-s.ctx.Done():
            return
        case <-st.C():
            // Continue when there's no fork choice attestation, there's nothing to process and update head.
            // This covers the condition when the node is still initial syncing to the head of the chain.
            if s.cfg.AttPool.ForkchoiceAttestationCount() == 0 {
                continue
            }
            s.processAttestations(s.ctx)
            if err := s.updateHead(s.ctx, s.getJustifiedBalances()); err != nil {
                log.Warnf("Resolving fork due to new attestation: %v", err)
            }
        }
    }

The key point is that 1) this happens once per slot because of the NewSlotTicker and 2) processAttestations is what handles the delay in processing them.

Possible solutions

I can see two options here, the optimal solution from the perspective of the network, if feasible, would be to call processAttestations right after updating the head, for example it can be done here.

Option 2 would be to at least make sure that we process these attestations sometime between 4 seconds and 7.5 seconds so as to at least give a minimal chance that these attestations will be seen by the aggregators of the current slot instead of the next one.

Option 1 needs to be weighted against performance impact. I favor this option because of the following:

Right after processing a block and updating head, not a busy time for the beacon, in fact this problem happens more often in slot 0 and after receiving the block for slot 0 we have already finished epoch transition and are idle essentially.
It makes sure that it processes early attestations immediately and therefore gossips them and can alleviate the problem of slot 0 blocks being late.
In a highly forked situation, we may update the head wildly, but we will not be calling this extra processing more than once per slot anyway, so it does not provide us with a problem when we are highly forked. In the worst possible case of a context deadline, we will automatically revert to the previous behavior since the queue will be cleared in the next tick.

Option 2 is simplistic but it provides minimum relief: we can just replace the tick there by params.BeaconConfig().SecondsPerSlot / 2 and at least we would be gossiping 2 seconds before aggregation which gives some chances for those attestations. I tried on pyrmont attesting at that time and wasn't penalized.

potuz commented 3 years ago

Just adding the info I got from Sproul and Sutton: both Lighthouse and Teku use something like the Option 1 I suggest here.

terencechain commented 3 years ago

Thanks for opening this issue! I agree option 1 is best

potuz commented 3 years ago

After takling to @terencechain he showed me that actually we do already do Option 2 and exactly with the same timings (I missed this call), it's here

// This defines how often a node cleans up and processes pending attestations in the queue.
var processPendingAttsPeriod = slots.DivideSlotBy(2 /* twice per slot */)

This also shows why it's so subtle to trigger this issue that it's very hard to catch.

prysmaticlabs / prysm

Process attestations after processing block #9662

🐞 Bug Report

The issue

Description of the problem

Detailed description of the problem:

Possible solutions