Open lexnv opened 8 months ago
Keep track of how many records are successfully and unsuccessfully published
- If failure rate > 30%: publish again at 10 minutes intervals
What do you mean by 30% failure rate? The quorum currently is 100% of the replication factor: https://github.com/paritytech/polkadot-sdk/blob/e88d1cb79315792a3dbccb6bdef2543093ecaf5b/substrate/client/network/src/discovery.rs#L404
Once every hour (or when keys change)
- publish 3 times the records at 1minute, 2minutes 4minutes
If we successfully published a record, i.e., 20 closest peers were reached, I don't think there is a point in repeating the publishing.
Missed that, I was under the impression we get one notification per peer, thanks for the info 🙏
The authority-discovery will publish the DHT records (containing Ip addresses) in the following manner:
Every 1h the authority records are republished
Every 1 minute the authority keys are checked, and if they changed, the records are republished
This at the moment does not handle the DHT failures at all. And there's been a problem with resetting the DHT timers which always advanced the republished (1h timers) on success, causing the DHT records not to be republished: https://github.com/paritytech/polkadot-sdk/pull/3764.
The proposed strategy for publishing records sooner:
Keep track of how many records are successfully and unsuccessfully published
Once every hour (or when keys change)
This strategy aims to advertise DHT records more aggressively and gracefully handle DHT failures only after a threshold.
Would love to hear your thoughts on this 🙏
cc @paritytech/networking @bkchr @alexggh