paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.76k stars 632 forks source link

network/litep2p: Publishes empty addresses at startup #5147

Closed alexggh closed 1 month ago

alexggh commented 1 month ago

https://grafana.teleport.parity.io/goto/V-MQGUXSR?orgId=1

When starting a node with litep2p enabled I see in the logs that it publishes on the DHT addresses which are not valid

Publishing authority DHT record peer_id='12D3KooWSogJN9vDNkkpNFMAdDAB5XGUGaM5T8tDsB1F4x8vV2de' addresses='[]'
Publishing authority DHT record peer_id='12D3KooWKKotwpDtVqi7rZyKEovpVdDqiV9LuwVu1b3KTZgbXZw4' addresses='[]'

Nodes publish a few seconds later valid addresses, so this errors might be harmless, but they still need to be investigated.

cc: @paritytech/networking

lexnv commented 1 month ago

It looks like we rely on the Identify protocol to obtain the observed address.

When we receive a response via the identify info, we get the observed address and populate the external_addresses:

https://github.com/paritytech/polkadot-sdk/blob/18db502172bdf438f086cd5964c646b318b8ad37/substrate/client/network/src/litep2p/mod.rs#L919-L924

We only report back addresses that have been confirmed at least 5 times, although I believe the confirmation can come from the same peer multiple times:

https://github.com/paritytech/polkadot-sdk/blob/18db502172bdf438f086cd5964c646b318b8ad37/substrate/client/network/src/litep2p/discovery.rs#L445-L458

Definetly something to look into, this might be the reason why some validators are failing to connect. I've also observed that libp2p can discover addresses much faster than litep2p, I think it relies on other mechanisms that we can look into.

dmitry-markin commented 1 month ago

I've also observed that libp2p can discover addresses much faster than litep2p, I think it relies on other mechanisms that we can look into.

libp2p uses address translation to "discover" external addresses, that boils down to replacing the port in the discovered external address by the port used in the listen address. While being one-shot procedure and working for simple 1:1 port forwarding cases, I think it's incorrect in the general case and inferior than heuristic of comparing addresses returned by different peers.

May be we can decrease the number of confirmations to 3, ensuring they are coming from different peers.

As for authority-discovery, would deferring publishing until we have at least one external address detected fix the issue?

lexnv commented 1 month ago

Yep, it sounds good to me, exactly my plan (although I picked 2 confirmations instead of 3) :D

alexggh commented 1 month ago

Whatever solution we come up with, can we also make sure it fixes this case as well ?