zawy12 / difficulty-algorithms

See the Issues for difficulty algorithms
MIT License
107 stars 25 forks source link

A correction to BTC's difficulty algorithm #39

Open zawy12 opened 5 years ago

zawy12 commented 5 years ago

This discusses an accuracy error in fixed-window difficulty algorithms that was brought to my attention by Pieter Wuille's twitter poll and result. Meni Rosenfeld wrote a paper about it in 2016. It also shows how coins and pools are not calculating hashrate precisely correct. It makes a big difference if you're estimating hashrate with a small sample.

Large error due to not taking increasing hashrate into account BTC difficulty adjustment have historically been almost a day before the scheduled 2 week target adjustment (coin emission is now about 7 months ahead of time) because hashrate usually increases (an average of about 6% per two weeks) before the next adjustment is made, so emission has been 6% faster every two weeks. This could have been corrected by looking at the past 4 weeks and adding on an adjustment for the slope, thereby taking into account the "acceleration" in hashrate instead of just its most recent "velocity".

Commonly known 2015/2016 error It is well-known that without the above "ramp error", the BTC code has an error that would have caused the 2016 block adjustment to take 2 weeks and 10 minutes. This comes from the code being like this: next_target = previous_target * sum_2015_solvetimes / target_time_2016_times_600 This is the same as: next_target = previous_target * avg(2015 solvetimes) / 600 * 2015/2016 This 2015/2016 is the cause of the lower-than-intended target (higher difficulty).

There is another 10 minute error in the same direction as BTC's known code error. It applies to fixed-window difficulty algorithms. The error is a lot smaller in simple moving averages. I can't figure out how to analytically determine the adjustment in simple moving averages, but the correction I've experimentally determined to be accurate down to N=4 is adjusted target = target/(1-3/N^2).

Satoshi and pretty everyone else thought for years that the next target (~1/difficulty) should be determined by:

T = desired block time
t = observed solvetimes
HR = hashrate
nextTarget = priorTarget * avg(t) / T = 2^256/(T*HR)

The above basis of all existing difficulty algorithms is wrong because in estimating HR to adjust the target, you have N samples of previous solvetimes (t) but the probability of fast solvetimes is higher (which is offset in an N=infinite average by some solvetimes being substantially longer). So when you take a smallish sample N of them, you're more likely than not to have more fast blocks. This causes the difficulty calculation to overestimate the hashrate and thereby set the target too low (difficulty too high) which makes the resulting avg solvetime higher than your goal.

The correction turns out to be simple, which appears to be directly related to the memoryless property (see bottom section "Similar Effect"): nextD = nextD * (N-1)/N nextTarget = nextTarget * N/(N-1)

I tested it all the way down to N=2 with no discernible error. It has a problem at N=1.

Derivations

My derivation below is based on what I needed to understand it. Others have a much more efficient way to explain it in a language I don't grasp. Meni Rosenfeld's paper derives it. Pieter Wuille did a poll and has an explanation. Pieter says it comes from E(1/X) where X~ErlangDistribution(N,1)

Code to prove the error

Here is code Pieter did to show the error is occurring in BTC. Note line 34 is the sum of the past 2015 solvetimes that skips the first solvetime in the 2016-blocks of solvetimes in the array, just like BTC, so the avg solvetime will shows the both the old and new error combined: 600*2016/2015*2015/(2015-1) = 600.6 seconds. https://gist.github.com/sipa/0523b510fcb7576511f0f670e0c6c0a5

A Similarly Unexpected Effect

The above N/(N-1) adjustment seems to be related to a surprising fact that 91% of Pieter Wuille's followers missed. Given a mean time to occurrence of T = 1/λ, when you take a random sample of time 2*T you see on average only 1 occurrence. Another way to say it is that if you take a random point in time, it will be T to the next block by the memory-less property. But memory-less is history-less and therefore works backwards in time as well so the time to the previous block is also T. Another way to say it is that some solvetimes are long enough to offset the more frequent short solvetimes, so random samples often fall somewhere inside a long solvetime. See also Russel O'Conner's article. This is extendable to longer periods of time. If you take a random sample of time that is N*T then you will see on average only N-1 blocks.

T = average solvetime = 1/λ
average(N blocktimes) =  T
average(random time sample N*T) = (N-1)*T

Deriving Exponential Distribution for PoW solvetimes

If the target is 1/1000th of 2^256, then each hash has a 1/1000 chance of being below the target. The p of a hash not that low is 1-1/1000 which means the p of not succeeding in a sequence of n hashes is (1-1/1000)^n. This can be rewritten as p = (1 - 1/1000*n/n)^n (1) For large n, the following is true: p = (1 - c/n)^n = e^(-c) We can use c = t HR target / 2^256 because 1/1000 comes from 2^256/target and n = number of hashes = HR * t. So :

p = e^(- HR * t * target/2^256)   (2)
Let λ = HR * target/2^256   
p = e^(-t*λ)
CDF = 1- p = 1 - e^(-t*λ)
PDF = d(CDF)/dt =  λ*e^(-t*λ)  

p is the probability a block will arrive after time t, so the 1-p=CDF is the probability it will arrive by t. The CDF is for a Exponential distribution. p = 1-e^(-t * λ) is commonly seen in engineering and nature because it is the fraction of something that is blocked in time or space for a blocking rate of λ per distance or time (in same units as t). A larger hashrate blocks solves from taking longer.

[update in 2022: The following is difficult even for me to follow. I'm sure there's a much better way to present it.]

Derivation of a Difficulty Algorithm

A common way to express a difficulty algorithm that's needed to get avg solvetime = T is to skip most of the following and say next_target = previous_target * λ_desired / λ_observed or next_target = previous_target * HR_expected / HR_observed which is the same as eq 3 below when replacing HR with λ_observed * 2^256 / previous_target.

To derive a difficulty algorithm, our goal is to set a target that will result in an avg solvetime of T. We do this by applying the expected value integral to the PDF and setting it equal to T. This gives λ = 1/T = HR*target/2^256. Rearranging shows how we should set the target: next_target = 2^256 / (HR*T) (3) We need to estimate HR from previous target(s) and solvetime(s), but we need to be careful because the expected value equation is for all possible solvetimes t which we will see only in an infinite number of samples. We can't simply say HR = 2^256 / t / prev_target for a 1-block estimate. It's technically impossible to estimate HR from 1 sample (see below). More generally, we can't use HR = 2^256 / sum_t / sum_target because there is a finite number of samples. It's approximately correct for a large number of samples, if the target is not changing.

In a difficulty algorithm like this: next_target = prev_target * λ_desired / λ_measured = prev_target * avg(t) / T the avg(t) will be too small if our sample size is small. To correct for the error, use: 1/λ_measured = avg(t) * N / (N-1) The math shows an infinity at N=1. But if no solvetimes are < 1 and there are integer solvetiems with T=600, 1/λ_measured = avg(t) * 6.5 For N=2, the correction factor is 2. See the last section how this seems to be related to the memoryless property. That is, picking the start of a block to start a timer is the same as picking a random point in time. If you pick a random point in time, the current block began T in the past and the next block will occur in T. This means a randomly chosen block will take on average 2*T due to slow solves canceling the effect of fast solves.

The solvetime for a block is determined by setting rand() = 1-CDF = x and solving for t in the following: 1-CDF = x = e^(-t*HR*target/2^256) t = -ln(x) * 2^256 / (HR * target) (5)

This entire issue is based on avg(1/y) != 1/avg(y). When we apply a HR adjustment to the target many times in a row, we're applying avg_many( 1 / avg_N(-ln(x))) which equals N/(N-1). A simple way to show this is correct for N=2 is to run the experiment or do a definite double integral of 1/[(-ln(x)-ln(y)/2] for x and y, both 0 to 1. So BTC-type difficulty algorithms (a fixed target for N blocks and adjusting only once per N blocks) require a modification to eq 3. next_target = prev_target * timespan/N / T * N/(N-1) (8)

BTW, more complex difficulty algorithms can change the target during the block based on the timestamp in the header. The correction factor can again be determined by using eq 5 where HR and target can be functions of t, then solve for t again to isolate the ln(x) function then integrate that from 0 to 1. This is the same as using the expectation integral on t*PDF from 0 to infinity because x is equally probable for all t's from 0 to 1.

Calculating Hashrate in General

As discussed above, if difficulty or target are constant: HR = priorD / avg(t) * (N-1)/N HR = 2^256/priorTarget / avg(t) * (N-1)/N If D or target change every block, the following is more accurate for determining hashrate, but the (N-1)/N correction should not be used in simple moving average difficulty algorithms because the N samples overlap when you're adjusting every block, so they're not independent sample. I tried [(N-1)/N]^(1/N) to account for this but it was not accurate for smallish N. HR = harmonicMeanD / avg(t) * (N-1)/N HR = 2^256/avgTarget / avg(t) * (N-1)/N The harmonic mean of D is used because of another problem caused by avg(1/x) != 1/avg(x). D = 2^256/target, but avgD != 2^256/target. You can show that harmonicMeanD = 2^256/avgTarget

bobbieltd commented 5 years ago

Even though I don't understand much. Thank you a lot for your time and effort to research and improve diff algo.

cryptozeny commented 5 years ago

nice report. thanks!

CGrunspan commented 5 years ago

Concerning Peter Wuille's post, here is the answer. Today, the parameter updating the difficulty is equal to n0*tau0 / (S_{n0-1}) with n0=2016, tau0=600 and S_n = time to mine n blocks. Let tau be the interblock time. Then 1/S_n follows an inverse Gamma process distribution with mean value 1/((n-1)*tau).

See : https://en.m.wikipedia.org/wiki/Inverse-gamma_distribution Taking average, we see that the mean value of the parameter updating the difficulty is : n0*tau0 / ((n0-2)*tau).

This quantity should be equal to 1. Otherwise, the difficulty would go to infinity or 0 almost surely by the strong law of numbers.

So, tau = (n0*tau0) /(n0-2) = tau0 + 2*tau0 /(n0-2).

Finally, the time to mine n0 blocks is: n0*tau = n0*tau0 + 2tau0 +4\tau0/(n0-2)

The exact time to mine 2016 blocks in Bitcoin is: 2 weeks, 20 minutes and 1.19 seconds.

Note that if you want to get 2 weeks exactly, it is enough to take n0*tau0 / (S_{n0+1}) for the formula updating the difficulty.

However, this computation has been done assuming that all miners are honest (no selfish miner).

In fact, there is a flaw in the difficulty adjustment formula but it is not this one (replace S{n0-1} by S{n0+1}). The bug is that the difficulty parameter underestimates the true total hashing power of the network in presence of an attacker. According to me, it is the biggest theoretical flaw in Bitcoin. We must count for orphan blocks. See our article where we explained that. Someone could write a BIP after reading it.

https://arxiv.org/abs/1805.08281

@CGrunspan

zawy12 commented 5 years ago

I found section 8 of the link to your research at the bottom very interesting (for others: his section 8 says miners should include orphan headers and difficulty is adjusted accordingly. The orphans show there is more network hashrate than normal difficulty calculates so normal difficulty algorithms have an error.)

Otherwise, the difficulty would go to infinity or 0 almost surely by the strong law of numbers.

Contrary to this statement, the law indicates it should only tend to the N/(N-2) error as we both conclude. Also, instead of taking too long, BTC is issuing blocks 6% too fast and LTC which does not have the 2016/2015 error is issuing blocks 2% too fast. I presume both are too fast because this is how much their hashrate is increasing per difficulty change, meaning the large error is due to the slow difficulty response.

Although I can't confirm your 2 eqs in the middle are algebraically correct, your initial equation and final statements are the same as my post.

CGrunspan commented 5 years ago

I wrote: "This quantity should be equal to 1. Otherwise, the difficulty would go to infinity or 0 almost surely by the strong law of numbers." You write : "Contrary to this statement..." --> You have not understood my statement or may be, it's my english (sorry!). I never claimed that the difficulty will go to 0 or to infinity. On the contrary, it was a "Reductio ad absurdum".

Let d_n (resp. delta_n) be the n-th difficulty parameter (resp. difficulty adjustment parameter).

We have dn/d{n-1} = delta_n. So, if delta =E[delta_n] < 1 (Reductio ad absurdum), then we have by the inequality of arithmetic and geometric means (see https://en.m.wikipedia.org/wiki/Inequality_of_arithmetic_and_geometric_means):

dn=dn/1 = \prod{i=1}^n deltai < (\sum{i=1}^n delta_i/n) ^n

We have \sum_{i=1}^n delta_i/N - > delta <1 by the strong of numbers. So d_n-> 0.

This was just to justify E[delta_n] =1 used in the proof.

You write : "Although I can't confirm your 2 eqs in the middle are algebraically correct, your initial equation and final statements are the same as my post." --> Is it the following equation that you don't understand? tau = (n0*tau0)/(n0-2)

Here tau is the true mean interblock time. The equation comes from the fact that E[1/S_{k}] = 1/((k-1)*tau) since 1/S_k follows an inverse Gamma distribution with parameter (k, tau). See the link to Wikipedia I gave above. This implies :

E[n0 tau0 / (S{n0-1})] =n0 tau0 E[1/S{n0-1}] = n0 tau0 / ((n0-2) tau) which leads to tau = (n0*tau0)/(n0-2)

MeniRosenfeld commented 5 years ago

This isn't a new error. It was known as early as 2016 (see https://arxiv.org/pdf/1708.05185.pdf).

zawy12 commented 5 years ago

Thanks for the correction. I'll update my writing.