opentensor / subtensor

Bittensor Blockchain Layer
The Unlicense
123 stars 131 forks source link

Add exponential backoff config to AURA #458

Closed sam0x17 closed 1 month ago

sam0x17 commented 1 month ago

Right now major migrations in subtensor are perilous. Under AURA consensus, the validators use a round-robin system to pick a validator each time to try to complete the migration. If they fail to do so within the 12 second time limit, a new validator is selected and the process continues. For migrations like the recent 1.0 upgrade, where the migration itself generally always takes more than 12 seconds, this will cause huge delays as the validators have to basically partially complete pieces of the migration, gossip those blocks to each other, and eventually randomly cobble together a complete version of the migration before finalization can continue, which can take hours.

Instead it would be much better if with each successive failing round, the 12 second time limit is increased by some scaling factor like 1.2x so that eventually the time limit will be long enough to complete any migration.

Presumably AURA already has some backoff setting that may or may not do what I describe above that simply needs to be turned on. We should definitely turn this on if so.

AC:

distributedstatemachine commented 1 month ago

completed in #480