paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.88k stars 689 forks source link

JSON-RPC: `chainHead_v1_follow` Subscription Sends `newBlock` events with unannounced `parentBlockHash` #5761

Closed josepot closed 1 month ago

josepot commented 1 month ago

The chainHead_v1_follow subscription is not compliant with the specification because it sometimes sends newBlock events where the parentBlockHash has not been previously announced. According to the specification:

parentBlockHash is guaranteed to be equal either to the current finalized block hash, or to a block reported in an earlier newBlock event.

Issue Details

While testing against the endpoint wss://sys.dotters.network/paseo, we observed that some blocks are missing from the newBlock events sequence. Specifically:

  1. Block #3012893 with hash 0x2b8cbe6887a3e2869d462920d784b604a4683751aea1f2e10f02f97c9333abae is announced.
  2. The next announced block is #3012896 with hash 0x0e95ce16ed5cbf2008bf00c5e1a65736a8cd00c6dcf08da17126d216f79605d9.

The two missing blocks are:

As a result, the parentBlockHash of block #3012896 refers to an unannounced block, which violates the specification's guarantee.

This issue causes the PAPI client to crash.

Logs

I've attached the logs capturing this behavior: pre.log

Steps to Reproduce

  1. Install Bun:

    curl -fsSL https://bun.sh/install | bash
  2. Set up the project:

    mkdir paseo-test
    cd paseo-test
    bun init -y
    bun install polkadot-api
    bun papi add -w wss://sys.dotters.network/paseo pas
  3. Create a file named paseo.ts with the following content:

    import { openSync, writeSync } from "node:fs";
    import { createClient } from "polkadot-api";
    import { getWsProvider } from "polkadot-api/ws-provider/web";
    import { pas } from "@polkadot-api/descriptors";
    import { withLogsRecorder } from "polkadot-api/logs-provider";
    import { fixUnorderedBlocks, parsed } from "polkadot-api/polkadot-sdk-compat";
    
    const pre = openSync("pre-enhancer.log", "w");
    const post = openSync("post-enhancer.log", "w");
    const rpcEnhancer = parsed(fixUnorderedBlocks);
    
    const client = createClient(
     withLogsRecorder(
       (log) => {
         writeSync(post, `${log}\n`);
       },
       rpcEnhancer(
         withLogsRecorder((log) => {
           writeSync(pre, `${log}\n`);
         }, getWsProvider("wss://sys.dotters.network/paseo")),
       ),
     ),
    );
    
    const api = client.getTypedApi(pas);
    
    api.query.System.Account.watchValue(
     "15roJ4ZrgrZam5BQWJgiGHpgp7ShFQBRNLq6qUfiNqXDZjMK",
    ).subscribe(
     (x) => {
       console.log("Account change", x.data);
     },
     console.error,
    );
    
    client.finalizedBlock$.subscribe(console.log, console.error);
  4. Run the script:

    bun run paseo.ts

Impact

This issue disrupts the integrity of the block sequence received via the subscription, making it challenging to process data accurately. It hinders our ability to maintain a consistent view of the blockchain state, as we rely on the guarantee provided by the specification regarding parentBlockHash.

Expected Behavior

Each newBlock event's parentBlockHash should either be:

Actual Behavior

The newBlock event for block #3012896 references a parentBlockHash that was neither finalized nor previously announced via a newBlock event.

Request

Please investigate why the chainHead_v1_follow subscription is emitting newBlock events with parentBlockHash values that have not been previously announced.

cc: @jsdw @lexnv

josepot commented 1 month ago

I was able to reproduce the same issue against wss://westend-rpc.polkadot.io. The logs: pre-enhancer.log

josepot commented 1 month ago

I recently reviewed new logs that provide additional insight into this issue. Specifically, in the logs, when block 0xb3ff4aa8b402e64414611090c6903f0a1859519121f1d512a0590a91568494ac is announced, it references a parentBlockHash (0x552f18a597f1cd37a5fd6c102259714f17786deeb6f672784c23ff73ea775a0d) that had not yet been announced. Later, a fork of 2 blocks that reference 0xb3ff4aa8b402e64414611090c6903f0a1859519121f1d512a0590a91568494ac are announced, but the missing block remains unreported.

It seems that at this point, the server detects that its state is corrupted and triggers a stop event.

This is not an isolated case; I observed a similar pattern yesterday. After the server sent blocks with missing parentBlockHash values, it followed up with a stop event.

One notable point is that after the stop event, the client immediately re-subscribes, and in the next initialized event, the final block in the list of finalizedBlockHashes is the block that was missing from the previous subscription.

EDIT: in those logs you will see other instances of the "stop" event. However, please know that the instances that have an extra property like "eventType": "internal" are events generated by the library due to an abrupt disconnection from the WebSocket connection. We have a middleware that handles these kinds of situations so that the client can keep on working (and it does).

lexnv commented 1 month ago

Thanks @josepot for the detailed report here! And apologies again for the inconvenience 🙏

Looking briefly at the code, the importing of NewBlocks only checks at the moment if the NewBlock hash was pinned previously. However, it does not check whether the notification.parent_hash is known.

https://github.com/paritytech/polkadot-sdk/blob/4ee1d7f10515c03c5a0ffd5b86aebd8bb6890aec/substrate/client/rpc-spec-v2/src/chain_head/chain_head_follow.rs#L412-L414

In those cases, we'd need to extend the state corruption detection to take this into account:

B0 -> B1 -> B2
      B1 -> (unknown number of blocks ..) -> B8

In this example, block B8 is reported as NewBlock with parent B7. The issue is that [B2..B7] are not announced yet.

Polkadot-Forum commented 1 month ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-api-updates-thread/7685/15