Open Earlz opened 5 years ago
The Qtum retargeting algorithm which sets the difficulty of mining each new Proof of Stake block is wrong. This causes two problems 1) occasional long spacings between blocks (30 – 40+ minutes) and 2) the whole blockchain is running about 12% slow (144 seconds block spacing vs. should be 128 seconds by design) which means 12% lower TPS, block rewards, halving, etc. Because the difficulty is a consensus parameter, it will take a hard fork upgrade to fix.
The long spacings between blocks are caused by a sequence of fast blocks (< 128 seconds) pushing the difficulty too high. It is not deterministic that these short block sequences will produce a long block, but they have once a month, or more. A recent example is block 303,994 on January 20, 2019, at 16:54:24 UTC, which had a spacing of 46 minutes and 8 seconds.
This table shows the lead up to this slow block, with the short block sequence starting at block 303,983:
For 200 blocks bracketing this slow block the average difficulty was 3.26 million. Block 303994 had a difficulty about 3.33 times the average.
Difficulties for blocks 303,890 to 304,089 are shown in the chart below, with the average (dashed green line). Notice the sequence of short blocks starting about block 303,983 pushing up the difficulty.
Reviewing the slow Mainnet blocks in the range 6,000 to 307,800 gives these results:
This means, on average, each month there are 20 blocks with spacing greater than 20 minutes.
The solution is to fix the retargeting algorithm which adjusts the target for every new block. The algorithm calculation is made at pow.cpp line 92 – 93, essentially
newTarget = previousTarget * (targetMultiplier + actualSpacing + actualSpacing) / (targetMultiplier + 256)
the actual code is
bnNew *= ((nInterval - 1) * nTargetSpacing + nActualSpacing + nActualSpacing);
bnNew /= ((nInterval + 1) * nTargetSpacing);
This gives an exponential moving average, since the numerator has an addition for two times the block spacing, while the denominator has an addition for two times the target block spacing (or 256 seconds). If the block spacing is 128 seconds the target multiplier will be 1.0 and the target will be unchanged. If the block spacing is < 128 (block was faster than the desired spacing of 128 seconds), the target multiplier will be < 1.0, the new target will be lower, making it, on average harder (longer) for the next block.
The retargeting formula is easy to calculate but causes wrong average block spacing. Academic papers for discrete-time queueing response show curved response graphs, while this formula gives a straight-line response. By introducing an additional multiplier of 1.067 for spacings greater than 128 seconds (essentially bending the response curve for slower blocks) the average block spacing for the 832 multiplier can be placed at 128 seconds:
I do not propose this solution because the 832 multiplier is still too aggressive and leads to runs (for faster than target spacing blocks) pushing the target way down, which causes slow blocks as shown above.
Instead, the solution is a much higher target multiplier of 20,000, which when used in the numerator and denominator of the retargeting algorithm gives a much gentler adjustment of the target. With a 20,000 target multiplier, the feedback curve looks like:
I chose the 20,000 value for the new target multiplier after many simulations. Using the retargeting calculation (without any additional factor for slower blocks) where the retargeting multiplier is varied from 400 to 80,000 gives this chart for average spacing:
20,000 blocks in the chart above represents about one month of real time, but takes several days to simulate. We can get better averages with more blocks (and less wallets – probability says the results must be the same), here 3 wallets for 5 million blocks (about 20 years real time):
Finally, a chart of simulation results which includes a tally of slow blocks > 640 seconds and the slowest block (Max Seconds) as the target multiplier value is adjusted from 400 to 80,000 (with 5,000 wallets for 20,000 blocks):
The 2nd fix is to remove unnecessary variation in the network weight calculation. Network weight is a short term SMA (simple moving average) of recent difficulties divided by a SMA of recent block spacings:
Network weight is used only to calculate time to expected reward. Using a sum of block spacings in the denominator (as opposed to a fixed denominator) provides no useful information for this interval given the amount of randomness in the retargeting algorithm, and causes large jumps in network weight when a long spacing enters (or leaves) the average.
For example, a block spacing of 20 minutes (which will still happen with the changes in step 1) will cause an instant drop of 10% in network weight, followed 72 blocks later by a 10% jump when the long spacing leaves the moving average.
The chart below shows simulated results (5,000 wallets, 2,000 blocks, retarget multiplier 20,000, true network weight 19.2 million) for the current Network Weight and the New Network Weight (fixed denominator and corrected scaling factor):
There will always be variation in network weight due to random block spacing, but using a fixed denominator reduces the variation spikes.
It's better to use exponential adjusting against linear adjusting.
The key point of difficulty adjustment algorithm is src/pow.cpp#L91-L93, we write it as a clearer form:
where T_n is the target of block n, t_n is the interval of block n.
We introduce a new difficulty adjustment algorithm which has the form
Exponentials are used in the equation because it's easy to multiply together. From this equation we get
The left term is the average block interval, from the equation above, when n is big enough, the average block interval equals to 128.
Here is the comparison between two algorithms of different K:
K | Average (Old) | Slow Blocks (Old) | Average (New) | Alow Blocks (New) |
---|---|---|---|---|
200 | 191.49 | 5.82% | 128.1648 | 2.62% |
300 | 161.30 | 3.51% | 128.0262 | 1.96% |
400 | 150.69 | 2.58% | 128.0064 | 1.62% |
500 | 145.24 | 2.11% | 128.0023 | 1.43% |
600 | 141.85 | 1.82% | 128.0010 | 1.30% |
800 | 137.99 | 1.49% | 128.0003 | 1.14% |
1000 | 135.84 | 1.32% | 128.0001 | 1.05% |
1500 | 133.07 | 1.11% | 128.0000 | 0.93% |
2000 | 131.74 | 0.99% | 128.0000 | 0.88% |
2500 | 130.96 | 0.93% | 128.0000 | 0.85% |
3000 | 130.46 | 0.89% | 128.0000 | 0.82% |
4000 | 129.84 | 0.86% | 128.0000 | 0.79% |
5000 | 129.46 | 0.83% | 128.0000 | 0.78% |
6000 | 129.22 | 0.80% | 128.0000 | 0.77% |
7000 | 129.04 | 0.79% | 128.0000 | 0.76% |
8000 | 128.91 | 0.78% | 128.0000 | 0.75% |
Yes, the exponential algorithm provides a more accurate solution, compared to simply tweaking the parameters of the previous algorithm.
The new exponential multiplier exp_mul()
function:
is called from CalculateNextWorkRequired()
:
https://github.com/qtumproject/qtum/blob/733061a2b54d4d50365b07e9b48a9a8d455eadd6/src/pow.cpp#L121
The Mainnet Hard Forked at block 466,600 implementing this new code on October 17, 2019, with the following results.
Difficulty swings were reduced:
Long block spacings were reduced:
The average block spacing dropped to 128 seconds:
Right now the block time of the Qtum network has been observed to have a cyclic "sway" of long and short blocks. JB has done extensive analysis on this problem complete with network simulation and has found the rather complex cause, as well as a solution. This QIP proposes to use his solution. This would require a hard fork