paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.91k stars 704 forks source link

network: Some network error occurred when fetching erasure chunk #6587

Open lexnv opened 22 hours ago

lexnv commented 22 hours ago

Kusama litep2p node is spamming the following earnings after a while:

WARN tokio-runtime-worker parachain::availability-distribution: Some network error occurred when fetching erasure chunk origin=Public(a85bf9402a1fc5d3732952744ccb9ff0990e8c101d0da3cd01efdff98e26f420 (GP4tF1if...)) relay_parent=0x01f4bee6dd8c1d2397265722fcdc3ce7946ec33188e7df2d208686bf0ca9a7f9 group_index=GroupIndex(32) session_index=43676 chunk_index=ValidatorIndex(273) candidate_hash=0x6db57a3a09f7caaa617b60db14433895ca6cd9a14b3b3e475ee53f2e1abb9779 err=Network(DialFailure) traceID=145828136324240632105239557157853280405

Offhand it looks like the node is not able to dial peers: DialFailure. Pending confirmation on libp2p node.

Version deployed: version 1.16.1-ca8beaed148

Grafana link: https://grafana.teleport.parity.io/goto/K-F2Od7NR?orgId=1

cc @paritytech/sdk-node

lexnv commented 20 hours ago

Reproduces on libp2p as well, not isolated to litep2p:

123456782024-11-21 16:16:27.030  INFO main sub-libp2p: Running libp2p network backend
...

2024-11-21 16:16:30.556  WARN tokio-runtime-worker parachain::availability-distribution: Some network error occurred when fetching erasure chunk origin=Public(500cc2cfa106832f103b44c90db4fe4f57dc30ef81073e0bf212c4d0574d355f (EPH935R1...)) relay_parent=0x72fd5cfacc625b24eee0db33cfff3415bb6806d0e2301b6c42db910e082ed409 group_index=GroupIndex(53) session_index=43678 chunk_index=ValidatorIndex(121) candidate_hash=0x68b93303dd6899341aaed21d4de493e8dd56472be8aeb64774fbdf02eeec7bb8 err=Network(DialFailure) traceID=139201321189557106610241167805705917416
...

2024-11-21 16:16:36.595  WARN tokio-runtime-worker parachain::availability-distribution: Some network error occurred when fetching erasure chunk origin=Public(a6fbda06023bce68011ad74b4a520be902f18db526a27f794facf69aedca455d (GMGHKdBn...)) relay_parent=0x72fd5cfacc625b24eee0db33cfff3415bb6806d0e2301b6c42db910e082ed409 group_index=GroupIndex(78) session_index=43678 chunk_index=ValidatorIndex(121) candidate_hash=0xce2ba85e7e1cff23887357dd6ae304692e0187ff24a1b51f3a0556237bac8b81 err=Network(DialFailure) traceID=274047650827900523247050184326805390441
alindima commented 19 hours ago

Are all the dial failures when dialing the same peer? maybe there are some bad validators that are not responsive