Open dmitry-markin opened 1 year ago
We'd also need to modify the Polkadot specification to deprecate these. I don't know how laborious process that is going to be. Eventually we should remove them but right now they aren't doing much harm either and we probably want to keep the alias support at least on code level in case it's needed in the future.
Is there a reason they should be removed in the first place?
If I remember correctly, the idea was that a genesis-hash in protocol names safeguards us from connecting to the peers of a different chain at the protocol negotiation stage. This is likely not that important if nobody remembered to finish this during the last 10 months, so this is just being a little annoying to have all this redundant ProtocolId
-based names in the codebase.
... and we probably want to keep the alias support at least on code level in case it's needed in the future.
For sure. This issue is only about removing the hardcoded legacy protocol names, not the fallback protocol name mechanism itself.
This is likely not that important if nobody remembered to finish this during the last 10 months
It is not important for Polkadot, but it was important for third-party chains, as they would often not set any value for protocolId
, which would lead to the protocol name of multiple different chains colliding.
As a result, if a single person of a chain A connects to a node of chain B (where A and B have the same protocol names), the peer-to-peer networks of the two chains become intertwined and everything works less well.
Why nobody complained in 10 months is because builders tend to just believe that "things are complicated" and not realizing that the problems they face can be fixed.
As DHT-crawling by @BulatSaif shows, we have some protocol names not respecting even ProtocolId
-based convention.
Here are the protocol names for Polkadot chain:
INFO[0000] Agent value="Parity Polkadot/v1.0.0-1ed6e2e50a4 (polkadot-connect-0)"
INFO[0000]
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/state/2
INFO[0000] Protocol value=/ipfs/ping/1.0.0
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/block-announces/1
INFO[0000] Protocol value=/polkadot/req_chunk/1
INFO[0000] Protocol value=/polkadot/send_dispute/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_pov/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_collation/1
INFO[0000] Protocol value=/polkadot/validation/1
INFO[0000] Protocol value=/polkadot/req_collation/1
INFO[0000] Protocol value=/dot/state/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/grandpa/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/light/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/sync/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/sync/warp
INFO[0000] Protocol value=/dot/sync/2
INFO[0000] Protocol value=/dot/block-announces/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/collation/1
INFO[0000] Protocol value=/ipfs/id/push/1.0.0
INFO[0000] Protocol value=/dot/transactions/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_statement/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_chunk/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/req_available_data/1
INFO[0000] Protocol value=/polkadot/req_available_data/1
INFO[0000] Protocol value=/dot/kad
INFO[0000] Protocol value=/dot/light/2
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/kad
INFO[0000] Protocol value=/polkadot/req_statement/1
INFO[0000] Protocol value=/ipfs/id/1.0.0
INFO[0000] Protocol value=/dot/sync/warp
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/validation/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/send_dispute/1
INFO[0000] Protocol value=/polkadot/req_pov/1
INFO[0000] Protocol value=/paritytech/grandpa/1
INFO[0000] Protocol value=/polkadot/collation/1
INFO[0000] Protocol value=/91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3/transactions/1
And for Kusama:
INFO[0000] Agent value="Parity Polkadot/v1.0.0-1ed6e2e50a4 (kusama-connect-0)"
INFO[0000]
INFO[0000] Protocol value=/polkadot/collation/1
INFO[0000] Protocol value=/polkadot/req_collation/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_collation/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/2
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad
INFO[0000] Protocol value=/ksmcc3/sync/2
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/state/2
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/validation/1
INFO[0000] Protocol value=/ipfs/id/1.0.0
INFO[0000] Protocol value=/ipfs/id/push/1.0.0
INFO[0000] Protocol value=/polkadot/req_available_data/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_pov/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_chunk/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/collation/1
INFO[0000] Protocol value=/ksmcc3/light/2
INFO[0000] Protocol value=/ksmcc3/block-announces/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_statement/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/send_dispute/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/block-announces/1
INFO[0000] Protocol value=/paritytech/grandpa/1
INFO[0000] Protocol value=/ksmcc3/kad
INFO[0000] Protocol value=/polkadot/send_dispute/1
INFO[0000] Protocol value=/ksmcc3/transactions/1
INFO[0000] Protocol value=/polkadot/req_statement/1
INFO[0000] Protocol value=/polkadot/req_pov/1
INFO[0000] Protocol value=/ipfs/ping/1.0.0
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/req_available_data/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/light/2
INFO[0000] Protocol value=/ksmcc3/sync/warp
INFO[0000] Protocol value=/polkadot/validation/1
INFO[0000] Protocol value=/ksmcc3/state/2
INFO[0000] Protocol value=/polkadot/req_chunk/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/warp
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/transactions/1
INFO[0000] Protocol value=/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/grandpa/1
As seen, protocol names like /polkadot/collation/1
and /paritytech/grandpa/1
, etc. are common for both chains, what might cause confusion if the nodes happen to connect to the clients of a wrong chain.
It is worth mentioning that this may be causing issues on parachains. Each /genesis-hash/<protocol>
is duplicated by /sub/<protocol>
. For example /sup/kad
is present in most of the parachains causing all chains to join in a single DHT table (testnets and production network).
When I run DHT crawler for parachains I see more than 16k nodes even for small parachains which should have only 10 nodes.
Here list of the protocols reported by parachains:
Bringing this up once again, we'd need to be careful when removing /sup/kad
legacy protocol name for parachains. Because right now all the parachains participate in a shared DHT, disabling this legacy protocol can lead to DHT fragmentation that the network will not be able to handle. Practically, this meant that the node won't be able to find peers, and @altonen already seen this when he tried to disable /sup/kad
locally.
As was proposed by @altonen, we could drop /sup/kad
support in two steps:
/sup/kad
protocol for discovery, but stop adding peers to the local routing table (how?)/sup/kad
support.@altonen Could you comment if I'm confusing something?
I think that approach could be less intrusive than just removing the legacy protocol in one swoop but I have no evidence to back it up. Maybe worth simulating locally how both approaches (gradual vs non-gradual phase out) work where it's easy to inspect routing tables of all nodes and see how they behave.
Here you check which protocols the remote supports. If it only supports /sup/kad
, it probably doesn't belong to the same network and shouldn't be added to the routing table whereas if it supports both /our-genesis/kad
and /sup/kad
then it could be added.
Thanks, then I confused it. I thought we somehow continue using /sup/kad
for DHT lookups without adding the peers to the routing table :exploding_head:
It would still use /sup/kad
to find peers because while your own routing table would only contain, e.g., Kusama AssetHub nodes, when you do a random walk, some of the nodes you discover would be Kusama AssetHub nodes and some of them would be random nodes from other networks. Kademlia would connect to some of those random nodes, based on whose closest to the target, and they would return more random nodes but possibly also some Kusama AssetHub nodes. Eventually the routing tables would be in a state where it would only find Kusama AssetHub nodes and /sup/kad
could be removed. It's not a perfect system because GetClosestPeersOk
doesn't reveal which protocols the found peers support but during the recursive queries you'd connect to the peers and they would send their Identify information, allowing you to add them to the routing table selectively. That's how I envisioned it working but it would need to be tested first.
I don't know what the Polkadot specification has to say about this (letter and spirit of the law etc.) so if you decide to go this route, you may have to specify this as part of the removal of legacy protocols. It is worth the effort, I don't know.
Why nobody complained in 10 months is because builders tend to just believe that "things are complicated" and not realizing that the problems they face can be fixed.
@tomaka To be honest, the warning message in smoldot made me remove our protocol id, which lead to exactly what you are describing during testing, so we added it back for now. :wink:
Personally, I would prefer if it was at least optional now, so when it is not specified, the legacy protocol is just not run. The sub
solution is the worst of both world. I will gladly provide a patch, if this solution would be accepted.
For some context here, the following PRs where merged as a first step:
It's been quite a while since the genesis-hash based protocol naming convention was introduced (see issue https://github.com/paritytech/substrate/issues/7746 & PRs https://github.com/paritytech/substrate/pull/11938, https://github.com/paritytech/polkadot/pull/5870, https://github.com/paritytech/polkadot/pull/5876).
It should be safe to remove old-style legacy
ProtocolId
-based protocol names like"/dot/sync/1"
now.