Closed mrseeker closed 1 year ago
Currently on subnet 1 there are 1024 UIDs, out of which 128 are validator permit nodes. There is also an immunity period of 24 hours.
By default, a validator will consider a random sample of 50 UIDs, which on average takes ~10 seconds. Running the reward model, at least on my modest hardware, takes another ~30 seconds, so we have in total about ~40 seconds to query 50 UIDs.
With 896 UIDs to hit, it takes about 896 / 50 * 40 = ~720 seconds or 12 minutes to hit the whole network. This is not entirely correct because the validator code in subnet 1 does not work with a shuffled reservoir, but instead only with samples, but for the purpose of napkin math this should still work.
A UID is hit 24 * 60 / 12 = 120 times in a day by a validator, which means that on average the EMA gets to be run 120 times.
For an alpha of 0.01, after 24 hours the validator will have only about ~70% of the final score for a newly registered UID:
>>> alpha=0.01
>>> s=0
>>> for _ in range(120):
... s = s * (1-alpha) + 100 * alpha
...
>>> s
70.06196086876682
If we increase alpha to 0.05, we get ~99.7%:
>>> alpha=0.05
>>> s=0
>>> for _ in range(120):
... s = s * (1-alpha) + 100 * alpha
...
>>> s
99.78775736213011
Even if the network is to be increased to 2048 UIDs, and it takes ~25 minutes to hit the whole network, and we hit a single UID ~60 times, with alpha 0.05 we know ~95% of the steady state score:
>>> alpha=0.05
>>> s=0
>>> for _ in range(60):
... s = s * (1-alpha) + 100 * alpha
...
>>> s
95.39302010130474
My proposal is to increase alpha to 0.05, and to consider long term turning this into a subnet parameter instead. This parameter currently only lives in the validator code, and most operators won't edit it in order to solve such problems. In subnet 3 alpha is 0.1, and UIDs reach their steady state scoring reasonably fast.
cc @opentaco for thoughts on this.
If anyone has a closed form formula for the EMA, please let me know.
@adriansmares appreciate the great analysis, we'll set the alpha to 0.05 for the openvalidators release.
this might need to change as we increase the number of uids. Lets put in a ticket and do deeper analysis once the network is stable
_Originally posted by @Eugene-hu in https://github.com/opentensor/bittensor/pull/1304#discussion_r1179385369_
This issue is causing validators that deleted their old model to lose trust (I am currently stuck at 0.81%) while the old validators are stuck at a "status quo" that is not healthy to the network.
Can someone fix this, since this issue is now already a month old.