paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.63k stars 573 forks source link

Remove support for legacy `ProtocolId`-based protocol names #504

Open dmitry-markin opened 1 year ago

dmitry-markin commented 1 year ago

It's been quite a while since the genesis-hash based protocol naming convention was introduced (see issue https://github.com/paritytech/substrate/issues/7746 & PRs https://github.com/paritytech/substrate/pull/11938, https://github.com/paritytech/polkadot/pull/5870, https://github.com/paritytech/polkadot/pull/5876).

It should be safe to remove old-style legacy ProtocolId-based protocol names like "/dot/sync/1" now.

altonen commented 1 year ago

We'd also need to modify the Polkadot specification to deprecate these. I don't know how laborious process that is going to be. Eventually we should remove them but right now they aren't doing much harm either and we probably want to keep the alias support at least on code level in case it's needed in the future.

Is there a reason they should be removed in the first place?

dmitry-markin commented 1 year ago

If I remember correctly, the idea was that a genesis-hash in protocol names safeguards us from connecting to the peers of a different chain at the protocol negotiation stage. This is likely not that important if nobody remembered to finish this during the last 10 months, so this is just being a little annoying to have all this redundant ProtocolId-based names in the codebase.

dmitry-markin commented 1 year ago

... and we probably want to keep the alias support at least on code level in case it's needed in the future.

For sure. This issue is only about removing the hardcoded legacy protocol names, not the fallback protocol name mechanism itself.

tomaka commented 1 year ago

This is likely not that important if nobody remembered to finish this during the last 10 months

It is not important for Polkadot, but it was important for third-party chains, as they would often not set any value for protocolId, which would lead to the protocol name of multiple different chains colliding.

As a result, if a single person of a chain A connects to a node of chain B (where A and B have the same protocol names), the peer-to-peer networks of the two chains become intertwined and everything works less well.

Why nobody complained in 10 months is because builders tend to just believe that "things are complicated" and not realizing that the problems they face can be fixed.

dmitry-markin commented 11 months ago

As DHT-crawling by @BulatSaif shows, we have some protocol names not respecting even ProtocolId-based convention.

Here are the protocol names for Polkadot chain:

INFO[0000] Agent    value="Parity Polkadot/v1.0.0-1ed6e2e50a4 (polkadot-connect-0)"
INFO[0000]                                              
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/state/2
INFO[0000] Protocol value=/ipfs/ping/1.0.0
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/block-announces/1
INFO[0000] Protocol value=/polkadot/req_chunk/1
INFO[0000] Protocol value=/polkadot/send_dispute/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_pov/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_collation/1
INFO[0000] Protocol value=/polkadot/validation/1
INFO[0000] Protocol value=/polkadot/req_collation/1
INFO[0000] Protocol value=/dot/state/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/grandpa/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/light/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/sync/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/sync/warp
INFO[0000] Protocol value=/dot/sync/2
INFO[0000] Protocol value=/dot/block-announces/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/collation/1
INFO[0000] Protocol value=/ipfs/id/push/1.0.0
INFO[0000] Protocol value=/dot/transactions/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_statement/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_chunk/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_available_data/1
INFO[0000] Protocol value=/polkadot/req_available_data/1
INFO[0000] Protocol value=/dot/kad
INFO[0000] Protocol value=/dot/light/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/kad
INFO[0000] Protocol value=/polkadot/req_statement/1
INFO[0000] Protocol value=/ipfs/id/1.0.0
INFO[0000] Protocol value=/dot/sync/warp
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/validation/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/send_dispute/1
INFO[0000] Protocol value=/polkadot/req_pov/1
INFO[0000] Protocol value=/paritytech/grandpa/1
INFO[0000] Protocol value=/polkadot/collation/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/transactions/1

And for Kusama:

INFO[0000] Agent    value="Parity Polkadot/v1.0.0-1ed6e2e50a4 (kusama-connect-0)"
INFO[0000]                                              
INFO[0000] Protocol value=/polkadot/collation/1
INFO[0000] Protocol value=/polkadot/req_collation/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_collation/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/2
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad
INFO[0000] Protocol value=/ksmcc3/sync/2
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/state/2
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/validation/1
INFO[0000] Protocol value=/ipfs/id/1.0.0
INFO[0000] Protocol value=/ipfs/id/push/1.0.0
INFO[0000] Protocol value=/polkadot/req_available_data/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_pov/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_chunk/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/collation/1
INFO[0000] Protocol value=/ksmcc3/light/2
INFO[0000] Protocol value=/ksmcc3/block-announces/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_statement/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/send_dispute/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/block-announces/1
INFO[0000] Protocol value=/paritytech/grandpa/1
INFO[0000] Protocol value=/ksmcc3/kad
INFO[0000] Protocol value=/polkadot/send_dispute/1
INFO[0000] Protocol value=/ksmcc3/transactions/1
INFO[0000] Protocol value=/polkadot/req_statement/1
INFO[0000] Protocol value=/polkadot/req_pov/1
INFO[0000] Protocol value=/ipfs/ping/1.0.0
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_available_data/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/light/2
INFO[0000] Protocol value=/ksmcc3/sync/warp
INFO[0000] Protocol value=/polkadot/validation/1
INFO[0000] Protocol value=/ksmcc3/state/2
INFO[0000] Protocol value=/polkadot/req_chunk/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/warp
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/transactions/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/grandpa/1

As seen, protocol names like /polkadot/collation/1 and /paritytech/grandpa/1, etc. are common for both chains, what might cause confusion if the nodes happen to connect to the clients of a wrong chain.

BulatSaif commented 10 months ago

It is worth mentioning that this may be causing issues on parachains. Each /genesis-hash/<protocol> is duplicated by /sub/<protocol>. For example /sup/kad is present in most of the parachains causing all chains to join in a single DHT table (testnets and production network). When I run DHT crawler for parachains I see more than 16k nodes even for small parachains which should have only 10 nodes.

Here list of the protocols reported by parachains:

Click to see... ### Kusama statemine parachain ``` INFO[0000] Agent count=1 value="Polkadot parachain/v0.9.430-9cb14fe3cee (statemine-connect-ew1-0)" INFO[0000] INFO[0000] Protocol count=1 value=/48239ef607d7928874027a43a67689209727dfb3d3dc5e5b03a39bdc2eda771a/block-announces/1 INFO[0000] Protocol count=1 value=/48239ef607d7928874027a43a67689209727dfb3d3dc5e5b03a39bdc2eda771a/kad INFO[0000] Protocol count=1 value=/48239ef607d7928874027a43a67689209727dfb3d3dc5e5b03a39bdc2eda771a/light/2 INFO[0000] Protocol count=1 value=/48239ef607d7928874027a43a67689209727dfb3d3dc5e5b03a39bdc2eda771a/state/2 INFO[0000] Protocol count=1 value=/48239ef607d7928874027a43a67689209727dfb3d3dc5e5b03a39bdc2eda771a/sync/2 INFO[0000] Protocol count=1 value=/48239ef607d7928874027a43a67689209727dfb3d3dc5e5b03a39bdc2eda771a/transactions/1 INFO[0000] Protocol count=1 value=/ipfs/id/1.0.0 INFO[0000] Protocol count=1 value=/ipfs/id/push/1.0.0 INFO[0000] Protocol count=1 value=/ipfs/ping/1.0.0 INFO[0000] Protocol count=1 value=/sup/block-announces/1 INFO[0000] Protocol count=1 value=/sup/kad INFO[0000] Protocol count=1 value=/sup/light/2 INFO[0000] Protocol count=1 value=/sup/state/2 INFO[0000] Protocol count=1 value=/sup/sync/2 INFO[0000] Protocol count=1 value=/sup/transactions/1 ``` ### Polkadot collective parachain ``` INFO[0000] Agent count=1 value="Polkadot parachain/v1.0.0-2306bfb7565 (polkadot-collectives-connect-ew6-0)" INFO[0000] INFO[0000] Protocol count=1 value=/46ee89aa2eedd13e988962630ec9fb7565964cf5023bb351f2b6b25c1b68b0b2/block-announces/1 INFO[0000] Protocol count=1 value=/46ee89aa2eedd13e988962630ec9fb7565964cf5023bb351f2b6b25c1b68b0b2/kad INFO[0000] Protocol count=1 value=/46ee89aa2eedd13e988962630ec9fb7565964cf5023bb351f2b6b25c1b68b0b2/light/2 INFO[0000] Protocol count=1 value=/46ee89aa2eedd13e988962630ec9fb7565964cf5023bb351f2b6b25c1b68b0b2/state/2 INFO[0000] Protocol count=1 value=/46ee89aa2eedd13e988962630ec9fb7565964cf5023bb351f2b6b25c1b68b0b2/sync/2 INFO[0000] Protocol count=1 value=/46ee89aa2eedd13e988962630ec9fb7565964cf5023bb351f2b6b25c1b68b0b2/transactions/1 INFO[0000] Protocol count=1 value=/ipfs/id/1.0.0 INFO[0000] Protocol count=1 value=/ipfs/id/push/1.0.0 INFO[0000] Protocol count=1 value=/ipfs/ping/1.0.0 INFO[0000] Protocol count=1 value=/sup/block-announces/1 INFO[0000] Protocol count=1 value=/sup/kad INFO[0000] Protocol count=1 value=/sup/light/2 INFO[0000] Protocol count=1 value=/sup/state/2 INFO[0000] Protocol count=1 value=/sup/sync/2 INFO[0000] Protocol count=1 value=/sup/transactions/1 ``` ### Rococo rockmine ``` INFO[0000] Agent count=1 value="Polkadot parachain/v1.0.0-2306bfb7565 (rococo-rockmine-collator-node-2)" INFO[0000] INFO[0000] Protocol count=1 value=/7c34d42fc815d392057c78b49f2755c753440ccd38bcb0405b3bcfb601d08734/block-announces/1 INFO[0000] Protocol count=1 value=/7c34d42fc815d392057c78b49f2755c753440ccd38bcb0405b3bcfb601d08734/kad INFO[0000] Protocol count=1 value=/7c34d42fc815d392057c78b49f2755c753440ccd38bcb0405b3bcfb601d08734/light/2 INFO[0000] Protocol count=1 value=/7c34d42fc815d392057c78b49f2755c753440ccd38bcb0405b3bcfb601d08734/state/2 INFO[0000] Protocol count=1 value=/7c34d42fc815d392057c78b49f2755c753440ccd38bcb0405b3bcfb601d08734/sync/2 INFO[0000] Protocol count=1 value=/7c34d42fc815d392057c78b49f2755c753440ccd38bcb0405b3bcfb601d08734/transactions/1 INFO[0000] Protocol count=1 value=/ipfs/id/1.0.0 INFO[0000] Protocol count=1 value=/ipfs/id/push/1.0.0 INFO[0000] Protocol count=1 value=/ipfs/ping/1.0.0 INFO[0000] Protocol count=1 value=/sup/block-announces/1 INFO[0000] Protocol count=1 value=/sup/kad INFO[0000] Protocol count=1 value=/sup/light/2 INFO[0000] Protocol count=1 value=/sup/state/2 INFO[0000] Protocol count=1 value=/sup/sync/2 INFO[0000] Protocol count=1 value=/sup/transactions/1 ``` ### Moonbeam ``` INFO[0001] Agent count=1 value="Moonbeam Parachain Collator/v0.32.2-d3172714146 (jpg3-moonbeam-boot-0)" INFO[0001] INFO[0001] Protocol count=1 value=/fe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d/block-announces/1 INFO[0001] Protocol count=1 value=/fe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d/kad INFO[0001] Protocol count=1 value=/fe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d/light/2 INFO[0001] Protocol count=1 value=/fe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d/state/2 INFO[0001] Protocol count=1 value=/fe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d/sync/2 INFO[0001] Protocol count=1 value=/fe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d/transactions/1 INFO[0001] Protocol count=1 value=/ipfs/id/1.0.0 INFO[0001] Protocol count=1 value=/ipfs/id/push/1.0.0 INFO[0001] Protocol count=1 value=/ipfs/ping/1.0.0 INFO[0001] Protocol count=1 value=/sup/block-announces/1 INFO[0001] Protocol count=1 value=/sup/kad INFO[0001] Protocol count=1 value=/sup/light/2 INFO[0001] Protocol count=1 value=/sup/state/2 INFO[0001] Protocol count=1 value=/sup/sync/2 INFO[0001] Protocol count=1 value=/sup/transactions/1 ```
dmitry-markin commented 3 months ago

Bringing this up once again, we'd need to be careful when removing /sup/kad legacy protocol name for parachains. Because right now all the parachains participate in a shared DHT, disabling this legacy protocol can lead to DHT fragmentation that the network will not be able to handle. Practically, this meant that the node won't be able to find peers, and @altonen already seen this when he tried to disable /sup/kad locally.

As was proposed by @altonen, we could drop /sup/kad support in two steps:

  1. Still use /sup/kad protocol for discovery, but stop adding peers to the local routing table (how?)
  2. Completely drop /sup/kad support.

@altonen Could you comment if I'm confusing something?

altonen commented 3 months ago

I think that approach could be less intrusive than just removing the legacy protocol in one swoop but I have no evidence to back it up. Maybe worth simulating locally how both approaches (gradual vs non-gradual phase out) work where it's easy to inspect routing tables of all nodes and see how they behave.

Here you check which protocols the remote supports. If it only supports /sup/kad, it probably doesn't belong to the same network and shouldn't be added to the routing table whereas if it supports both /our-genesis/kad and /sup/kad then it could be added.

dmitry-markin commented 3 months ago

Thanks, then I confused it. I thought we somehow continue using /sup/kad for DHT lookups without adding the peers to the routing table :exploding_head:

altonen commented 3 months ago

It would still use /sup/kad to find peers because while your own routing table would only contain, e.g., Kusama AssetHub nodes, when you do a random walk, some of the nodes you discover would be Kusama AssetHub nodes and some of them would be random nodes from other networks. Kademlia would connect to some of those random nodes, based on whose closest to the target, and they would return more random nodes but possibly also some Kusama AssetHub nodes. Eventually the routing tables would be in a state where it would only find Kusama AssetHub nodes and /sup/kad could be removed. It's not a perfect system because GetClosestPeersOk doesn't reveal which protocols the found peers support but during the recursive queries you'd connect to the peers and they would send their Identify information, allowing you to add them to the routing table selectively. That's how I envisioned it working but it would need to be tested first.

I don't know what the Polkadot specification has to say about this (letter and spirit of the law etc.) so if you decide to go this route, you may have to specify this as part of the removal of legacy protocols. It is worth the effort, I don't know.

FlorianFranzen commented 1 week ago

Why nobody complained in 10 months is because builders tend to just believe that "things are complicated" and not realizing that the problems they face can be fixed.

@tomaka To be honest, the warning message in smoldot made me remove our protocol id, which lead to exactly what you are describing during testing, so we added it back for now. :wink:

Personally, I would prefer if it was at least optional now, so when it is not specified, the legacy protocol is just not run. The sub solution is the worst of both world. I will gladly provide a patch, if this solution would be accepted.

lexnv commented 1 week ago

For some context here, the following PRs where merged as a first step: