smol-dot / smoldot

Lightweight client for Substrate-based chains, such as Polkadot and Kusama.
GNU General Public License v3.0
179 stars 47 forks source link

Excessive `operationInaccessible` events #1804

Closed josepot closed 4 months ago

josepot commented 4 months ago

It happens fairly often that one (or some) of the initial storage operations gets "hanged" by an operationInaccessible event. Polkadot-API does its best to try again later, and it always ends up working (eventually), but sometimes it takes several attempts. Meaning that it can take ~15 seconds (~20 attempts, one every ~750ms) for the operation to actually resolve.

Therefore, what polkadot-api does internally is to try the same storage query with the next finalized/best block (unless the consumer has specifically requested the storage entry against a block in particular). However, sometimes the next finalized/best block also takes a while to arrive, and that ends up translating in a very deteriorated UX due to very long loading times.

It's worth pointing out that this only seems to happen after right after "initializing" the chain.

The smoldot logs: sm-logs.txt

The logs of the messages send over JSON-RPC (I noticed that in the smoldot logs they are trimmed, so just in case): wire-logs.txt

tomaka commented 4 months ago

In the logs there are the following Polkadot blocks, each a child of the next:

On the parachain peer-to-peer network, we see:

The inaccessible block is 0x70a9...


So what I think happens is that on the parachain peer-to-peer network we see blocks that are ahead of what the relay chain has marked as best. My guess is that I never noticed this happening before because we now have asynchronous backing, although I don't fully understand the consequences of asynchronous backing from the top of my head.

Unfortunately, smoldot doesn't parse parachain blocks, because in principle they don't have to be valid headers. Consequently, smoldot doesn't understand that 0x70a9... is a parent of 0x208b.... In theory, any peer that knows 0x208b... also knows 0x70a9..., but smoldot doesn't understand that and thinks that none of the peers knows 0x70a9....

At some point the relay chain will mark 0x208b... as best block and things work again.

tomaka commented 4 months ago

I think that, given the way parachains work, a simple and correct fix is to assume that by default all parachain nodes know the block that the relay chain has marked as best. If it turns out to not be the case, then we can ban them.