zawy12 / difficulty-algorithms

See the Issues for difficulty algorithms
MIT License
107 stars 25 forks source link

Introduction to Difficulty Algorithms #12

Open zawy12 opened 6 years ago

zawy12 commented 6 years ago

CryptoNote coins do not normally have a bits field or "maxTarget aka "powLimit" aka "leading zeros". This means difficulty = avg number of hashes needed to find a solution. CN difficulty = 2^256 / target CN hashes needed for 50% chance of winning = difficulty CN hashrate needed for 50% chance in solvetime = difficulty / targetSolvetime

Bitcoin / Zcash / etc In order for a 256-bit hash to be valid (i.e. the miner finds winning block), it must be less than the "target" value encoded in nBits. A 256 bit hash has equal likelihood of being any of the 2^256 possible values. Miners change the nonce and compute the hash until a hash less than the target is found. If the target is 2^216 =~ 1.05E65 (the 217th bit from the right (least significant) is 1 and the rest of the bits are zero), it is expected the miner will need to compute 2^256/2^216 = 2^40 =~ 1.1 trillion hashes to encounter a hash lower than the target. This 2^256/target number of hashes expected is the actual difficulty which is what I refer to in all my articles. The reported difficulty of BTC, Zcash, etc scales the actual difficulty value down with a powLimit factor (aka maxTarget) that replaces the 2^256 in the numerator of the actual difficulty. I don't know of any useful purpose scaling down serves, but the code does not allow difficulty to get easier than powLimit. In BTC and Zcash this value is: BTC: powLimit = uint256S("00000000ffffffffffffffffffffffffffffffffffffffffffffffffffffffff"); ZEC: powLimit = uint256S("0007ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"); This is hex, so each digit represents a nibble (4 bits), so BTC's powLimit starts with 8 zero nibbles, or 8*4=32 "leading zeros" of bits. Zcash's "7" nibble is 0111 in bits, so Zcash has 3*4 + 1 = 13 bits of leading zeros. All this means that when BTC and ZEC have an actual difficulty of 2^256/target, they report the difficulty powLimit / target which is 2^(256-x)/target where x is the number of leading zeros. In other words, the actual difficulty (number of hashes expected) is 2^x times more the reported difficulty.

But these are not exactly the reported difficulty (there's a 1/65,500 difference in BTC and 1/500,000 in ZEC) because powLimit is converted to the "nBits" encoding format before the powLimit/target calculation is made. The target is also converted (compressed) to this format after the difficulty algorithm calculates a value. The encoding truncates all but the most significant bits of targets and powLimit. The compressed powLimit values in nBit format are: BTC: nBits-encoded powLimit = uint256S("00000000ffff0000000000000000000000000000000000000000000000000000"); ZEC: nBits-encoded powLimit = uint256S("0007ffff00000000000000000000000000000000000000000000000000000000");

Instead of trying to spend a lot of time explaining how the nBits encoding converts 256-bit values to 4 bytes (like here, here, and here ), I'll just show the math. In the following the left-most hexadecimal is in the nBits format. The 1D & 1F are the exponents (number of bytes in the encoded number) and the 00FFFF and 07FFFF are the mantissa.

BTC powLimit: 
1D 00 FF FF = 256^(0x1D - 3) * 0x00FFFF = (2^8)^(1*16+13 - 3) * 65,535 = 2^208 * 65,535
ZEC powLimit:
1F 07 FF FF = 256^(0x1F - 3) * 0x07FFFF = (2^8)^(1*16+15 - 3) * 65,535 = 2^224 * 524,287

The "-3" is because the 1D and 1F exponents include the 3 bytes in the mantissa, so they need to be subtracted when using it as the exponent. The highest bit of the mantissa for the encoding is supposed to be used to indicate sign and this is why BTC's powLimit is not 1C FF FF FF. It's a mess for no reason. Satoshi should have used 2 bytes instead of 4 where the first byte is the exponent indicating number of bits and the 2nd is the mantissa that would have 256 levels of precision for a given exponent.

The target for each block is encoded in the 4-byte nBits aka "bits" field in the block header. Some block explorers simply (stupidly) convert the 4 bytes of the hexadecimal nBits to a decimal "bits" field that makes no sense. It goes up & down with target in the right direction, but the scale makes no sense. You simply convert it back to hexadecimal to get it in the nBits format. Here are two nBit target values for BTC & ZEC and how to precisely calculate the reported difficulty that block explorers show with just a calculator:

BTC:
nBits target = 17 5D 97 DC = (2^8)^(0x17-3) * 0x5D97DC =  2^160 * 6133724
difficulty = powLimit/target =  2^208 * 65,535 / 2^160 / 6133724 = 2^48 *65,535/6133724
ZEC:
nBits target = 1C 01 0B 13 = (2^8)^(0x1C-3) * 0x010B13 = 2^200 * 68,371
difficulty = powLimit/target = 2^224 * 524,287 /  2^200 / 68,371 = 2^24 * 524,287 / 68,371

Again, this is not the actual difficulty miners have to solve. They just have to find a hash that is less than the target which is a difficulty (expected number of hashes) equal to 2^256 / target.

See also bitcoin wiki on difficulty

bdiff = difficulty pool difficulty = 256/255 * difficulty

As an interesting idea, BTC / Zcash clones could change their max powLimit to 2^256 / block_time so that hashrate is equal to the reported difficulty. The primary goal of difficulty algorithms is to estimate hashrate to set the difficulty so that the average solvetime is correct, so it should be the best-available estimate of hashrate.

Derivation of Basic Difficulty Equation

The goal of a difficulty algorithm is to estimate the current network hashrate (HR) to set D so that on average it will take the network the target solvetime (T) to find a solution. [update: if a difficulty algorithm can be made to reduce switch mining that causes oscillations in solvetimes, the average confirmation time will reduce to the avg solvetime so that is another desirable goal. Another goal could be to make the distribution of solvetimes more spread out than the exponential so that there are fewer orphans or so that block time can be reduced. See footnote. ] Difficulty algorithms only seek to set the difficulty according to: D = HR * T / 2^x where HR=network hashes per second and x is the number of leading zeros the code uses to scale D down.

[ update: for a more complete derivation of a fixed-window difficulty algorithm see Correcting BTC's difficulty algorithm ]

*[ update # 2: the almost perfect correct difficulty algorithm is ASERT: next_target=previous_target e^[(t/T-1)/N] ]

The only way to know the current network HR to solve for D in the above equation is to see how long it takes to solve blocks for previous D's. Rearranging the equation and changing T to observed solvetimes (ST) , we can measure HR of a single solve: *HR = 2^x D / ST** where ST=solvetime. So we get an HR for the previous D and ST and plug it into the previous equation to get the needed D for the next block. There are two big problems with this. First, there is a huge variation in each ST. Secondly, there is a 50% chance that this single measurement will be < 0.693 of the correct HR because the probability distribution of a Poisson process is skewed low. ( The median is ln(2)=0.693 of the mean. ) So the HR will constantly jumping too high, > 1/0.693 too high half the time.

Naively, I first tried to determine HR by Arithmetic Mean *HR = 2^x avg(Di/STi) = horribly wrong**

Closer would be the Harmonic Mean *HR = 1 / avg( STi / (Di 2^x)) = 1/ avg(STi) / avg(Target) = good but still wrong**

The more accurate method is this: Weighted Harmonic Mean or Weighted Arithmetic Mean HR = 2^x avg(Di) / avg(STi) = 2^x sum(Di) / sum(STi) = better but not perfect

For whatever reason, the following needs less of a correction factor to have a more perfect avg solvetime when used to set the difficulty: *HR = 2^x harmonic_mean(Di) / avg(STi) = better but not perfect** Note that harmonic_mean(Di) = 1 / avg(target/2^(256-x)), so that algorithms which average target instead of difficulty are calculating harmonic_mean(Di). This seems to give very slightly better results in algorithms like LWMA that depend on individual D/ST values.

I'll use velocity as an example to show how having to choose the right averaging method is a common problem that's not specific to our Poisson distribution. Let v = a velocity analogous to HR. Let d=distance like difficulty and t = time per distance like solvetime per difficulty.

If you change velocity every fixed interval of time and record the velocity in each fixed t, then your average velocity is: avg_v = avg( v ) Arithmetic Mean If you change v every fixed d (like mile markers) and record only velocity you have to use: avg_v = 1/ avg(1/v) Harmonic Mean If you have velocity but are measuring it in non-equal intervals of either distance or time, you use either the Weighted Arithmetic Mean (if we have a different ti for each vi) or the Weight Harmonic Mean (a different di for each vi): avg_v = sum(vi ti) / sum(ti) Weighted Arithmetic Mean avg_v = sum(di) / sum(di/vi) Weighted Harmonic Mean But our case with difficulty is that we don't have hashrate (velocity) and a choice between D or ST. We have D and ST. Notice that if you likewise do not have velocity data directly, but have d and t, then two weighted means are equal: avg_v = sum(di/ti ti) / sum(ti) = sum(di) / sum(di/(di/ti))= sum(di) / sum(ti) Although we look at it piecemeal when we're gathering the data, we don't need to know which d was associated with each t because velocity = total_d / total_t. This is what we need for difficulty.

So we have: HR = 2^x avg(D) / avg(ST) = 2^x sum(D) / avg(ST) For a given rolling window of size N, we have: HR = 2^x sum(N Ds) / sum(N STs) = 2^x avg(D) / avg(ST) We call this a simple moving average (SMA) although it is a weighted mean as explained above. Plug this into our 1st equation that describes the goal of a difficulty algorithm: D = HR * T / 2^x and we get a basic SMA difficulty algorithm:

next_D = T sum(N D) / sum(N ST) adjust

where "adjust" is needed for small N and I have determined it via experiment with a modeled constant HR and confirmed this equation on live coins from N=17 to N=150: adjust = 1/ (0.9966+.705/N) for 10<N<150 and just leave it off for N>150. Values of adjust at N=15, 30, 60, 120 are 0.958, 0.98, 0.992, 0.998. I do not know if this adjustment factor is needed because of the skewed Poisson, or if it is the result of the equation being recursive: past D's affect future D's. A largish variation in HR requires a slightly larger adjustment (the adjust correction value is smaller), but it can't be calculated unless you do an adjustment based on the StdDev of the D.

There is also the average target method (harmonic mean of difficulties) to replace the numerator in the above which does not change the results except the adjust factor is closer to 1 which implies this method is more correct.

For small N, there will be a lot of error in this next_D calculation. You can "expect" it (aka 1 StdDev) to vary roughly by +/- 1.3/SQRT(N) and generally the result will be within +/- 2.6/SQRT(N).

Better than SMA

The above SMA algo has a problem. We want to set D based on current HR but it gives the average hashrate as it was N/2 blocks in the past. So we want to somehow take the slope into account. We need to look at individual values of D and ST despite the errors above. When averaging discrete values, the simple harmonic mean above is a LOT more accurate for HR than the arithmetic mean because the ST varies a LOT more than D. It's like we are assuming D is fixed which is when it should be used. That is what the best algorithms do. An exception is the tempered SMA of Digishield v3. It acts like a better version of an SMA with N=60. It averages only very recent blocks, but tempers how much those recent blocks an increase it. In a weird way it is like taking a slope into account. Its equation (without the Digi v3 problems is: next_D = T * sum(N D) / [0.75*N*T + 0.2523*sum(N ST) ] This is a really good algorithm with N=17, if the problems are removed from the code. It usually invites fewer hash attacks from random variation at a cost of more total blocks with post-attack delays. I changed the 0.25 in the algo to 0.2523 to get avgST more precise. Experimentally and in Zcash and Hush avg ST was 0.2% to 0.5% too high.

The EMA takes slope into account and amazingly needs no correction factor, even if HR varies a lot. next_D = previous_D * [T/ST + e^(-2*ST/T/N)*(1-T/ST) ]

Importance of Price/Difficulty ratio

Due to the impossibility of getting the exact current HR, we have to make a choice. To get the most recent estimate, we have to use a smaller N which causes more accidental variation. We have to choose between "stability" and "recency". We end up needing to consider miner profit motives:

Miners seek the highest (coins/blockprice+fees/block) / (difficulty\2^x).

For a given coin, this approximately proportional to (Coin Price)/Difficulty. If price jumps 25%, about 300% more hash power jumps on which needs the algorithm to respond quickly. But if you make N too small, your difficulty will exceed 25% changes by itself from random variation, causing the Price/difficulty to be as attractive to miners as if the price jumped 25%. This means.....

The selection of N is as important as the algorithm

[ This section is outdated. ] (https://github.com/zawy12/difficulty-algorithms/issues/14). My best guess at the best value for N (as it is used in SMA, WHM, and my version of EMA) is T = 600, N=40 T = 300, N=60 T = 150, N=80 T = 75, N=100 Which by an equation is: N=int(50*(600/T)^0.33)

Dealing with Bad Timestamps

Miners choose the block's timestamp which causes confusing problems for difficulty algorithms. See handling bad timestamps. Miners could send difficulty very high or low or cause nodes to crash (e.g. divide by zero or computing a negative difficulty) if the algorithm does not handle timestamps correctly. There's not a way for the same code to work in all algorithms. SMA's that subtract first timestamp from last have to do something different from SMAs that loop and the WHM. The EMA has to use a 3rd method.

Simply assigning solvetime=1 or 0 when a timestamp is out of sequence causes a catastrophic exploit in the EMA, WHM, and SMA's that loop. A >15% hash rate miner can send your difficulty to zero indefinitely. But realistically the low difficulty would attract so much more hash power from other miners who are not doing it, that the exploit is limited.

Difficulty algorithms as PID controllers

If the goal of difficulty algorithms is to find the current hashrate and set difficulty accordingly, then it is different from processes that need a PID controller. The algorithms do not look at an error signal of where the system is and where we want it to be and then set the next D accordingly. We measure HR and set D for that HR. Mining and solvetimes also do not seem to have a "mass" aka momentum, so they can't be modeled and controlled like the the typical 2nd order differential equation of typical physical systems. Miner motivation suddenly increases and decreases based on price/difficulty ratio, so even if we somehow control based on that, it's not a linear equation necessarily could be correctly controlled witha typical PID controller. Also, miners are active agents seeking profit who will change their behavior based on the controller. So in PID terminology the "plant" changes as soon as you devise an optimal controller for it. But I'm trying see if the PID control ideas might help.

Useful Equations

The median of a Poisson (solvetimes) is ln(2)=0.693 of its mean. The StdDev of a Poisson is same as its mean. To simulate solvetimes: ST = D * 2^x / HR * (-ln(rand()) D = (2^256-1)/(nBits as target).

Probability of a solvetime < t CDF = 1-e^(- λ*t) where I assume the difficulty is correct, so λ = 1 block per T which gives CDF = 1-e^(- t/T) Probability of a solvetime > t e^(-t/T)

Histogram is derivative of CDF aka the PDF or PMF, i.e. λ exp(−λt). For example T=600, 10,000 blocks, and histogram bins = 5 seconds, then use the following to generate the histogram: block count / 5 seconds = 5*1000*(1/600)*e^(-t/600)

Use CDF to predict orphan rate to an estimate of network delays, assuming there is no 1-block selfish mining and network can be modeled as two nodes with 50% hashrate each and there is a network delay between them. orphan rate = 1-e^(-delay*2/T)

Probability P of seeing M blocks in X time when the true average is N blocks: P = N^M/M!/e^(N) Repeat the above for values > or < M and sum them all to get the total probability of seeing "M or more" or "M or less".

Addendum you probably want to ignore An good approximation instead of the summing loop to get P (for M and above or M and below) is to treat the M sample as if it were continuous. If the Poisson were continuous, the M sample would exactly follow a Gaussian, which is why this is close enough for most practical purpose. Start with the basic equation: M = N +/- StdDev * SQRT(N) eq1 For example, 95% of the time, M will be within N +/- 1.96*SQRT(N). StdDev=1 is the "expected" deviation (68.2% chance a set of M is within the range indicated). Divide both sides by N and you can say the percent error expected is within +/- 100%/SQRT(N). Since difficulty is mostly 1/avg(ST), this is also approximately the error in difficulty algorithms that act like an SMA of size N. The error is actually a bit larger.

Side Note: I use this to estimate the behavior of difficulty algorithms. I have to multiply the StdDev by a constant k of something like 1.1 to 1.5 depending on the level of probability, the algorithm (SMA, EMA, WHM), and if I'm looking at the higher or lower limit. D variation estimate = D*(1 +/- k*StdDev*SQRT(N) )

Getting back to the goal of finding P of the found blocks being above or below some M: Instead of using a chart to look up probabilities for the StdDev in the above, use: StdDev = (2.48*LN(0.69/P))^0.5-0.91 where P is same as above (1 tailed and not 2-tailed as is typical in charts). Rearranging and combining the two equations gives: Probability of finding M or more (or M or less) blocks when the true average in that given time period is N. P = 0.685\*e^(-0.81\*[ABS(M'/N-1)\*(N/2)^0.5+0.64]^2) eq2 The "ABS()" corrects it to be able to use > or < M correctly. Use M'=M+0.5 for M and below, and M'=M-0.5 for M and above (shift by 0.5 to towards the mean in both cases). You can find cases where this has a lot of error such as small N and unlikely events (e.g., N=10, M=2 gives 0.009 when it should be 0.003) but these cases are also the ones where the luck of finding 1 extra block or not makes a huge difference.

It would be nice to calculate the range of avg of M solvetimes to be expected with at a given P in order to get a general equation for detecting hash attacks based on an M-sample of blocks. But I can't. I have to select specific criteria like "avg 11 STs > 2xT" and then experimentally test to see how rare this is. To show why I can't do it by theory for a general equation, I start by dividing eq 1 on both sides by X to get M/X = 1/avgST on the left and substitute N = X/T expect blocks in X, then invert both sides to get the range for avgST that I wanted. This gives avgST of M blocks in X time = X/M= T/(1 +/- StdDev*SQRT(T/X) ) This is a problem because I need SQRT(T/X) to be a constant and X will vary. This is why it only works only in very restricted conditions for estimating variation in D. D is approximately a 1/avgST like this was attempting.

So detect sudden hash rate changes, I have to loop through the blocks to add up their solvetimes, waiting for a certain X to be exceed so that I have a fixed N expected = X/T and a varying M'. I'll have some level of P required and will know the M' for the unusually slow or fast STs before hand. The M from the loop is converted: M' = M-1 for M>N and M'=M for M<N.

Footnote: The goal of the difficulty algorithm is to estimate the current hashrate to set the difficulty so that the average solve and confirmation times are as close as possible to the block time, but random enough to prevent orphans.

Assuming we want solve times to follow the exponential distribution it can almost be simplified to "Estimate the current hashrate." That is used to set the difficulty via target = 2^256 / HR / Block_time. But the hashrate depends on the exchange rate, fees, and on the difficulty itself, so the first definition is more complete. We have to use past difficulties to estimate what the hashrate was, but the difficulty we choose affects the hashrate. We could know what difficulty was needed more accurately if we knew the exchange rate and fees, but the difficulty does not have immediate access to fees and no access to exchange rate.

If there is 1 tx/s then average confirmation time is sum(st_i*(st_i+1)/2) / sum(st_i). This comes out to average solvetime if difficulty is always set correctly for the current hashrate, otherwise it is higher. To see a reason why it is true is this: exponential distribution is memoryless, so that a random point in time (a random tx if txs are evenly distributed and sufficiently frequent) will average the average solvetime to the next block (and surprisingly, also average the average solvetime to the previous block, i.e. 2xT between the two sequential blocks because long solvetimes are more common.)