smol-dot / smoldot

Lightweight client for Substrate-based chains, such as Polkadot and Kusama.
GNU General Public License v3.0
178 stars 47 forks source link

Unable to run certain runtime-apis on the Kilt parachain #1914

Open josepot opened 1 month ago

josepot commented 1 month ago

We (the Polkadot-API team) have been advocating for teams within the ecosystem to avoid creating their own RPC endpoints and instead utilize custom runtime APIs. However, the Kilt parachain is experiencing issues with some of their runtime APIs when using Smoldot. Specifically, Smoldot consistently triggers an operationInaccessible event for some runtime APIs.

One possible explanation is that the Kilt parachain node was built using an older version of the Polkadot-SDK (version 1.0.0). They are currently in the process of upgrading their nodes to version 1.7.0. We are hopeful that this upgrade will resolve the issue.

Despite this, we decided to report the bug, in case the problem is not related with the version of their node.

Using polkadot-api, on a bun/nodejs project the following code:

import { kilt } from "@polkadot-api/descriptors";
import { Binary, createClient } from "polkadot-api";
import { getSmProvider } from "polkadot-api/sm-provider";
import { chainSpec } from "polkadot-api/chains/polkadot";
import { start } from "polkadot-api/smoldot";
import kiltChainSpec from "./spiritnet";
import { withLogsRecorder } from "polkadot-api/logs-provider";
import { appendFileSync } from "fs";
import { getTickDate } from "./tick-date";

const appendSmlog = (level: number, target: string, message: string) => {
  appendFileSync(
    "./smoldot-logs.txt",
    `${getTickDate()} (${level})${target}\n${message}\n\n`,
  );
};
const smoldot = start({ maxLogLevel: 8, logCallback: appendSmlog });

const client = createClient(
  withLogsRecorder(
    (x) => {
      appendFileSync("wire.txt", x + "\n");
    },
    getSmProvider(
      smoldot.addChain({ chainSpec }).then((relayChain) =>
        smoldot.addChain({
          chainSpec: kiltChainSpec,
          potentialRelayChains: [relayChain],
        }),
      ),
    ),
  ),
);
const kiltApi = client.getTypedApi(kilt);

console.log("waiting the first finalized block");
await client.getFinalizedBlock();

console.log("done waiting, starting runtime call...");
const result = await kiltApi.apis.Did.query_by_web3_name(
  Binary.fromText("ingo"),
);

console.log(result);

results on the kiltApi.apis.Did.query_by_web3_name request never resolving.

These are the logs of what's happening over the wire: wire.txt

These are the logs from smoldot: smoldot-logs.txt

The chainspec, in case it's useful: spiritnet.json

cc @rflechtner

tomaka commented 1 month ago

The only thing that the logs show is that the Substrate node rejects connections from smoldot. To figure out why, we would need the Substrate sub-libp2p logs while smoldot is connecting.

rflechtner commented 1 month ago

Are you sure that's the issue? I see call proof requests for the Did_query_by_web3_name going out and succeeding, but the response seems to be rejected by smoldot, leading to a MissingProofEntry error and the peer in question being banned. This is propagated to rpc clients as an operationInaccessible. E.g.:

2024-07-19T15:09:12.738Z (5)runtime-kilt
foreground-runtime-call-request-start; block_hash=0x0ab3…f112, function_name=Did_query_by_web3_name, parameters_vectored=0x1069…676f, call_proof_target=12D3KooWDJzJ7TRNKvE2DWXMSSsoKR5TgxsnNy3W1eCBPveX6g9i

2024-07-19T15:09:12.738Z (4)network
call-proof-request-started; chain=kilt, target=12D3KooWDJzJ7TRNKvE2DWXMSSsoKR5TgxsnNy3W1eCBPveX6g9i, block_hash=0x0ab3…f112, function=Did_query_by_web3_name

[...]

2024-07-19T15:09:12.851Z (4)network
call-proof-request-success; chain=kilt, target=12D3KooWDJzJ7TRNKvE2DWXMSSsoKR5TgxsnNy3W1eCBPveX6g9i, total_size=9.0 kiB

2024-07-19T15:09:12.851Z (4)runtime-kilt
foreground-runtime-call-progress-invalid-call-proof; block_hash=0x0ab3…f112, function_name=Did_query_by_web3_name, parameters_vectored=0x1069…676f, remaining_attempts=0, error=MissingProofEntry, virtual_machine_call_duration=560µs, proof_access_duration=38µs

2024-07-19T15:09:12.851Z (4)json-rpc-kilt
json-rpc-response-yielded; response={"jsonrpc":"2.0","method":"chainHead_v1_followEvent","params":{"subscription":"FUpadyDMcqpcBW9N3VezAebbLi67D7MbmHyHDyGHWZhW","result":{"event":"operationInaccessible","operationId":"ArK1WsA98kGpmVnPKW3HHKLPk9BfniKn8EbJ51xwSJfv"}}}

The kilt runtime and node are still on a fairly old v1 of the polkadot-sdk (~1.1 I think), and we have hopes that these issues may disappear with the next upgrade. But without knowing what the issue may be this is hard to predict.

tomaka commented 1 month ago

Yeah it's not clear what happens given that the logs are very dense. I didn't notice the call proof failing to be decoded on the smoldot side, but towards the bottom of the logs the Substrate nodes do force-reset the connections for smoldot. I don't know if both issues have the same underlying cause.