[BUG]: Suspect processing in BlockApplicator.handleForkHeads()

Is there an existing issue for this?

[X] I have searched the existing issues

Current behavior

AFAIK it works like this:

BlockApplicator has a data structure to keep a collection of what I will call pending blocks, that is, blocks that have been received from peers but not yet submitted to the chain.
The pending blocks are indexed in a variety of ways (blocksById, blocksByPrevious, blocksByHeight)

When LIB advances, BlockApplicator invalidates pending blocks on minority forks that forked behind LIB.

(1) How do we determine which pending blocks are invalidated?
(2) What do we do with those pending blocks?

Current behavior implemented in handleForkHeads() seems to be as follows:

(1) Pending blocks less than LIB height are invalidated, with an inefficient loop that processes all pending blocks' height.
(2) Pending blocks are rejected with p2perrors.ErrUnknownPreviousBlock, and naughty points are assigned

Expected behavior

I would expect the following behavior:

(1a) Pending blocks less than LIB height are invalidated, with a more efficient loop.
(1b) Pending blocks greater than or equal to LIB height are invalidated, if they no longer connect to LIB.
(2) Pending blocks are rejected in this way as part of normal operation of correctly behaving peers. They may be logged, but no naughty points should be assigned.

Steps to reproduce

During normal sync, spam of p2perrors.ErrUnknownPreviousBlock (previous block does not exist) occurs.

Environment

- OS: Ubuntu 22.04

Anything else?

Here's a more detailed analysis of this problem:

(a) It should be specified in a comment in the code that the pending block cache of BlockApplicator maintains an invariant that it is always a single connected tree rooted at LIB.
(b) If a pending block is submitted to BlockApplicator, it is immediately rejected if it does not connect to LIB or an existing pending block.

The invariant means we know the pending blocks' starting height (it's the LIB root), and there are no gaps in height, so the currently implemented, general height handling logic is unnecessary and inefficient; we can just use a for loop:

(c) Delete all blocks with height >= old LIB height, and < new LIB height, using a for loop of the form for h:=oldLibHeight; h<newLibHeight; h++ { /* Delete all blocks in blocksByHeight[h] */ }

The current code doesn't consider that some blocks ahead of LIB (in terms of height) are no longer reachable after an LIB update. We can find all such blocks by querying for blocks with same height as LIB, then following them forward in the graph:

(d) Add all non-LIB blocks with height == LIB to a queue.
(e) Consume elements from the queue until it is empty. When a pending block pb is consumed, add blocksByPrevious[pb.id] to the queue.

With respect to naughty points:

(f) Naughty points must not be assigned when BlockApplicator deletes a block due to (a)-(e).
(g) BlockApplicator itself generally should not raise errors that assign naughty points. It can, however, pass through an error that occurred when a block was submitted to chain, and we can also assign naughty points for blocks that take too long for chain to process.

There may be some misbehavior cases where currently BlockApplicator errors result in correct assignment of naughty points to peers. For these cases, I suggest moving the detection logic to the peer handler.

(As a matter of project management, I suggest accepting a PR for this issue even if it disables / deletes correct detection of some genuine misbehaviors. Then have a separate, separately prioritized issue / PR to re-implement detection of those misbehaviors in the peer handler.)

I'm fairly sure the above-mentioned tree trimming logic is far from original, and it already exists (with unit tests!) somewhere in chain or statedb. The most productive way to handle this issue may be to simply re-implement that implementation in Go.

koinos / koinos-p2p