paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.71k stars 618 forks source link

network/litep2p: Validators do not seem to be fully connected when using litep2p backend #5019

Open alexggh opened 1 month ago

alexggh commented 1 month ago

Reproducing steps:

Run a local zombienet with 10 validators or more with litep2p enabled, E.g:

zombienet -c 2  test -p native ./polkadot/zombienet_tests/functional/0009-approval-voting-coalescing.zndsl

Observed behaviour

Connectivity Report says that validators are not connected to all of its peers and in some occasions is reporting: Connectivity seems low, we are only, see logs bellow

Expected behaviour

Validators should be able to connect to all of its peers and it is what happens if you run with the default libp2p backend.

2024-07-15 16:31:40.193 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:40.194 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-10.log
2024-07-15 16:32:00.714 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity seems low, we are only connected to 83% of available validators (see debug logs for details)
2024-07-15 16:32:00.714 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=83 absolute_connected=10 absolute_resolved=12 unconnected_authorities=
2024-07-15 16:42:00.715 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-11.log
2024-07-15 16:32:03.201 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=91 absolute_connected=11 absolute_resolved=12 unconnected_authorities=
alice-12.log
2024-07-15 16:32:03.104 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-1.log
2024-07-15 16:31:46.055 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:46.055 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-2.log
2024-07-15 16:31:46.130 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:46.130 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-3.log
2024-07-15 16:31:50.144 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:50.144 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-4.log
2024-07-15 16:31:50.116 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:50.116 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-5.log
2024-07-15 16:31:53.256 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=91 absolute_connected=11 absolute_resolved=12 unconnected_authorities=
2024-07-15 16:41:53.256 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=91 absolute_connected=11 absolute_resolved=12 unconnected_authorities=
alice-6.log
2024-07-15 16:31:53.277 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity seems low, we are only connected to 83% of available validators (see debug logs for details)
2024-07-15 16:31:53.277 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=83 absolute_connected=10 absolute_resolved=12 unconnected_authorities=
2024-07-15 16:41:53.278 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
alice-7.log
2024-07-15 16:31:56.257 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:56.257 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=91 absolute_connected=11 absolute_resolved=12 unconnected_authorities=
alice-8.log
2024-07-15 16:31:56.271 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=100 absolute_connected=12 absolute_resolved=12 unconnected_authorities=None
2024-07-15 16:41:56.271 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity seems low, we are only connected to 83% of available validators (see debug logs for details)
2024-07-15 16:41:56.271 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=83 absolute_connected=10 absolute_resolved=12 unconnected_authorities=
alice-9.log
2024-07-15 16:32:00.800 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=91 absolute_connected=11 absolute_resolved=12 unconnected_authorities=
2024-07-15 16:42:00.801 DEBUG tokio-runtime-worker parachain::gossip-support: Connectivity Report connected_ratio=91 absolute_connected=11 absolute_resolved=12 unconnected_authorities=

FYI: @paritytech/networking

dmitry-markin commented 1 month ago

I have seen a similar issue (with much worse numbers, like 30% connected) on Versi when the authority-discovery was not working with litep2p due to DHT records not passing validation because some parts were missing on litep2p side. Some validators were still connected — my guess because they were in the DHT routing table.

So, I would start debugging from examining authority-discovery logs to see if validators are properly discovered.