paritytech / smoldot

Alternative client for Substrate-based chains.
GNU General Public License v3.0
308 stars 74 forks source link

Add unstable JSON-RPC function that exports recent networking events #2245

Open tomaka opened 2 years ago

tomaka commented 2 years ago

We have a big problem in substrate-connect right now: the behavior of smoldot when it comes to the peer-to-peer networking is extremely opaque, and if smoldot doesn't manage to connect to other peers on the chain it is very difficult to understand why.

To solve this problem, I propose to add a new unstable subscription-style JSON-RPC function that notifies of all the networking events that happen or that have happened recently or that concern an item that is still alive or that concern an item that has died recently.

Showing on a UI the state of the networking in a way that is understandable is a complicated problem, and by providing to the UI an exhaustive list of networking events that have happened recently, we move to substrate-connect (or alternative libraries built on top of smoldot) the problem of visualizing the state of the networking. This will make it possible to easily experiment with various different visualizations.

In details, the notifications would look like this:

{
    "type": "startConnect",
    "when": ... # A unix timestamp
    "connectionId": ... # Opaque ids allocated by smoldot
    "multiaddr": ...
}
{
    "type": "connected",
    "when": ... # A unix timestamp
    "connectionId": ...
}
{
    "type": "handshakeFinished",
    "when": ... # A unix timestamp
    "connectionId": ...
    "peerId": ...
}
{
    "type": "stop",   # Can mean either "disconnected" or "stopped trying to connect"
    "when": ... # A unix timestamp
    "connectionId": ...
    "reason": ...
}
{
    "type": "outSlotAssign",
    "when": ... # A unix timestamp
    "peerId": ...
}
{
    "type": "outSlotUnassign",
    "when": ... # A unix timestamp
    "peerId": ...
}
{
    "type": "inSlotAssign",
    "when": ... # A unix timestamp
    "peerId": ...
}
{
    "type": "inSlotUnassign",
    "when": ... # A unix timestamp
    "peerId": ...
}
{
    "type": "inSlotToOutSlot",   # Atomically unassign and in slot and assigns an out slot
    "when": ... # A unix timestamp
    "peerId": ...
}
{
    "type": "substreamOutOpen",
    "when": ... # A unix timestamp
    "connectionId": ...
    "substreamId": ...
    "protocolName": ...
}
{
    "type": "substreamOutAccept",
    "when": ... # A unix timestamp
    "substreamId": ...
}
{
    "type": "substreamOutStop",   # Either decline or force-close
    "when": ... # A unix timestamp
    "substreamId": ...
    "reason": ...
}

In order for a JSON-RPC client to be able to subscribe at any time and obtain the current state of the networking, smoldot would have to keep in memory events that concern connections/slots/substreams that are still alive, and events that have happened recently (say, the last 5 seconds or so).

For example, if a connection is established at 3pm and at 4pm a JSON-RPC client subscribes, smoldot should send the startConnect, connected, and handshakeFinished events that concern this connection, even though they actually occurred one hour ago. This lets the JSON-RPC client know that there is a connection alive and that it has been alive for one hour. If at 3h58 the connection was closed, then smoldot would return all these events as well, so that the JSON-RPC client can indicate that a connection that was alive for 58 minutes was closed recently.

All these events are fairly low level, and this is intentional. Trying to move to a higher level means that we have to become opinionated, which we don't want to at the moment. While they look difficult to use if you don't know how the networking works, I think that if you know how the networking works then they're fairly easy to analyze.

This function would intentionally remain unstable, at least for a long time, in the sense that its format can break from one version of smoldot to another. Of course, changes to the format would be documented in the CHANGELOG.

melekes commented 2 years ago

What are the benefits of this approach versus providing a JSON-RPC endpoint, which returns a networking state (aka Prometheus style pull model)?

tomaka commented 2 years ago

The argument of instantaneous UI updates seems important enough to me to justify it