zawy12 / difficulty-algorithms

See the Issues for difficulty algorithms
MIT License
108 stars 25 forks source link

Summary of Difficulty Algorithms #50

Open zawy12 opened 4 years ago

zawy12 commented 4 years ago

Here are all the common difficulty algorithms. All equations are expressed as floating point to make the math clear. Typically, you expand the equations for target's really high value to reduce the error caused by integer division. Floating point is never supposed to be used in financial applications or applications that require every validator to get the exact same value.

Limits on timespans, timestamps, and target changes are not included because they enable attacks

Some algorithms place a limit on how much timespans, timestamps, and/or target can change per block. I haven't included any of those limits because they always allow exploits on the consensus. This is because the limits change the math which is supposed to provide a fair estimate of hashrate. The estimate determines the difficulty and the sum of difficulties (i.e. chain work) determines consensus, so the limits are corrupt the consensus mechanism by preventing the math from providing a fair estimate of hashrate. I've described the 15 different exploits in detail in Timestamp Attacks. An example is the 4x and 1/4 timespan limits in BTC that, in combination with allowing out of order timestamps, allows unlimited blocks in finite time. I'm not referring to BTC's "2015" error aka Zeitgeist aka GeistGeld attack described long ago by Artforz. But there's a limit that's always required: requiring sequential ("monotonic") timestamps is a fundamental requirement of distributed consensus which Lamport derived in 1978. Nakamoto consensus does not get around his derivation which didn't assume any class of algorithms. It only assumed distributed consensus is the goal. This 1978 paper might be 1/4 the reason he got the Turing award. Several BIPs have had to deal with the problems caused by violating this requirement. The BIP patches (such as median of past 11 timestamps) enforce the monotonicity rule in roundabout ways.

A Long Sidebar on Real-Time-Targeting

All the algorithms can be made theoretically more accurate by converting to an "RTT" (Real Time Target) by making the most recent timestamp in the equations tN+1 instead of tN. This means making the difficulty of a block depends on the timestamp the miner assigns to that self-same block. This "timestamp-manipulatable-difficulty" is the problem everyone has with RTTs, but it's not a problem if honest miners keep and enforce accurate timestamps which is a separate article I need to publish that shows how to also stop selfish mining. In short, honest miners ignore a block for 2 block times if the timestamp is more than a few seconds into the future.

Tom Harding developed an RTT and implemented Chain2 as a reference implementation that targets an accurate solvetime for each block. I didn't include in the list but it is target = prior_target C (t/T)^k where C is a constant and k>=1 and a larger k means a tighter distribution of solvetimes around T. "t" in this equation is different from the others. It's the miner's timestamp minus the timestamp of the previous block, not the solvetime of the prior block.

TSA is the only algorithm shown below that is exclusively meant to be an RTT. It is a fast-responding ASERT inside of a normal slow ASERT. To use it, the ASERT value will be the "official" difficulty for the block that is used to calculate chain work and to be the target used for the next block's ASERT calculation. The sub_target is the "actual" difficulty the miner has to solve for the block. It depends on the timestamp the miner assigns to the same block. P in TSA is a value less than M that causes the difficulty to be higher than the ASERT value if the solvetime is too fast, and lower if the solvetime has been taking too long. The TSA can be similarly done in terms of a fast EMA inside a slow EMA (or an LWMA). See TSA article for more detail such as how to handle timestamp manipulation. I helped JL777 implement an RTT like this on steroids for Komodo because some of their small sub chains were getting 1,000,000x increase & decreases in hashrate. It stopped attackers from getting more than 2 blocks cheaply and prevented stuck chains when they left.

Determining the Best Algorithm

ASERT appears to be the only (nearly) perfectly correct difficulty algorithm. See bottom for a full discussion. WTEMA is much simpler and almost exactly equal to it.

ASERT appears to be the best DA in terms of theory and practice. The "relative" form has more error than the "absolute" ASERT due to nBits having up to 2^(-15) error that accumulates linearly with N (a 1% increase in average solvetime for each 330 increase in N if nBits is staying in that "highest rounding error" range). There can also be a "linear with N" error in the e^x calculation because it must be done with integer math and validators have to calculate the exact same value (floating point isn't supposed to be trusted to be accurate on different systems for any consensus or financial application). Any consistent errors can be "fixed" by changing T in the equations. The errors are small for reasonable values of N (N>500 would be an unusually slow adjustment or a DAG). See BCH's ASERT for the code to implement absolute ASERT. I prefer EMA which has the same errors as relative ASERT but it's a lot simpler to code and is the same except for a small e^x = 1+x approximation error. LWMA is about the same as these and does not have their problem of a >50% attack that can get 38% more blocks in a given time period. All the other algorithms should be avoided. An RTT algorithm could be used to give a more even spread of solvetimes and it can be shown to be the best on every metric, but doing it safely is complicated and no one believes me that it's possible to do it safely.

Testing ranks the ability of the algorithms from best to worst like this:

  1. TSA (an RTT) but too complicated
  2. CDF-EMA, EMA, ASERT, LWMA, WT (differences under normal conditions are subtle)
  3. ETH (EMA with hysteresis)
  4. Digishield & Grin (better if MTP-delay is removed & monotonic timestamps enforced)
  5. KGW, BRNDF
  6. Digishield (with default MTP delay)
  7. SMA (awful if on-off mining is bad)
  8. Cryptonote / Monero: SMA with timestamp sort, cut, and lag. (no small coins can survive using it)
  9. BTC/LTC (no small coins can survive using it)

The method of testing first puts them on equal terms by adjusting their primary parameter to have the same Std Dev in difficulty during constant hashrate. This turns out to be the following for the N and M parameters in the equations:

Once the "averaging window" parameter N is selected to give the same level of stability under constant hashrate, I compare them during on-off mining "attack" conditions. I look at how well they prevent attackers from getting lower targets per hash (using time-weighted target averaging) compared to "dedicated" miners (Mark Lundeberg suggested this) and how well the DAs prevent long delays to confirmation. Jonathan Toomin showed me this is not simply how long it takes blocks to solve, but requires pretending there is (for example) 1 txn every second and do the averaging for that. For a given solvetime st, the total waiting time for all the 1 tx/s confirmations for that block is st*(st+1)/2. You sum this over many blocks and divide by the sum of st's (because it is equal to the number of txs at 1 tx/s) which gives a more correct result for delays than averaging the average wait time (averaging st/2 values). Under constant hashrate with a good DA the average wait time is equal to the average average solvetime**, but on-off mining changes the Poisson distribution causing average delays to be higher than average solvetime, and an ideal DA keeps it as low as possible.

** The average wait time for a tx is not 1/2 the average solvetime even in the case of constant hashrate because some blocks come really slow which offsets the lack of delays in the fast solvetimes. The mean time to find a solution for random points in time (like when someone chooses to send a tx) is 1 block time (not 1/2 the block time aka long-term average solvetime) because "a Poisson process has no memory", i.e. long wait times biased the mean higher than you would expect.

Selecting the Averaging Window Size As important as the selecting the difficulty algorithm is selecting the averaging window size. I have historically chosen averaging windows that were too fast due to not having the right metrics in testing and due to seeing all Monero clones blow up from oscillations which I mistakenly thought was due to the long 24 hour averaging period in their SMA, but it was mostly caused by its delay, cut, and lag parameters which were good intentions gone really bad. SMA allows oscillations to continue but delay, cut, and lag (esp delay) force oscillations. As a result, most of the 60 or so coins using my LWMA have N=60 and the few using N=90 to N=144 are performing better. I can't find an error in Jonathan Toomim's work for BCH that indicates N=288 for ASERT/EMA is best (which is like N=576 for LWMA) and this is with T=600. With lower T's like the LWMA coins usually have, even a higher N might be best. BTG has been using LWMA N=45 and it's been one of the best & safest algorithms out there. I've recommended they go to at least N=90 while the raw "science" is saying 576. LWMA N=576 will take 2.5 days with T=600 to fully respond to a 20% increase in hashrate (which might be caused by price change). I prefer 1/2 or even less of what Jonathan's work is showing as best because it's more in my realm of experience and I'm afraid of a coin getting stuck. This gives me ASERT/EMA with M=144 and LWMA/WT with N=288 is probably best in all coins. Coins seeing large daily changes in price or on-off miners who often drive the difficulty a lot higher than makes sense might be better off with 1/2 these values.

Algos in terms of difficulty I chose to express the equations in terms of targets instead of their inverted forms that use difficulty. The inverted forms of these equation (that use difficulty as inputs instead of targets) will give the same result (but inverted) except for the ones that use avg target (SMA, DGW, Digishield, & LWMA). The difficulty forms of these four give different results because 1/avg(targets) is not exactly equal to avg(1/targets). By 1/target I mean difficulty = 2^256/target. To keep them perfectly the same when working in terms of difficulty like CN coins you could use the relation 1/avg(target) = harmonic_mean(difficulty). If targets (or difficulties) are constant then harmonic_mean = mean and it's all the same. Harmonic mean gives a lower value than the mean so algorithms that use avg(D) overshoot D, causing a little more instability and slightly longer solvetime as N gets small, otherwise it's the same.

Simplest Algorithm This is probably the simplest DA that works surprisingly not-too-bad. It simply adjusts up or down 0.5% if solvetime was below or above median expected solvetime. It's based on the exponential distribution having 50% of solvetimes below 0.693T if difficulty is set correctly. if (st < 0.693 T) { next_difficulty = prior_difficulty * 1.005; } else { next_difficulty = prior_difficulty / 1.005;

Algos in terms of target

image

Problems in the uncorrected forms of the algos:

See LWMA below for EMA/ASERT's only problem.

LWMA in terms of difficulty is this: D[h+1] = avg_D[ h to h-N+1] T (N(N+1)/2) / k[h] eq (1) k[h] = linear weighting of solvetimes = 1*st[h-N+1] + 2st[h-N+2] + ... N*st(h]
The (N*(N+1)/2) scales the above linear weightings back down to an "average".

An algebra trick to avoid a loop for D[h+2] appears to be to solve for k[h] in D[h+1] and do this: k[h+1] = k[h] - sum_st[h-N+1 to h ] + N * st[h+1] eq (2)

A loop can be avoided because: sum_st[h-N+1 to h ] = t[h] - t[h-N]
avg_D[ h to h-N+1] = ( CD[h] - CD[h-N] )/ N
where CD = cumulative difficulty.

Substituting and using eq 2 for the next block:: D[h+2] = (CD[h+1] - CD[h-N+1] )/ N T (N*(N+1)/2) / k[h+1]

An initial loop needs calculated very precisely to initialize this.

Using M = (N+1)/2 and rewriting can show why it's somewhat like EMA and therefore ASERT:

D[h+1] = avg_D[h] / (avg_D[h-1]/D[h] + st[h]/T/M - avg_st[h-1]/T/M )

* This is calculated by time-weighted attacker's average target divided by time-weighted avg of all targets. That is, % unfair gains = sum(target_when_attacker_was_active * time_at_attackers_target) / sum(each_target*time_at_each_target/total time) . The specific test for the rankings used a miner motivation equation to model the apparent motivation in BCH's 2019 on-off mining problem. Specifically, it says "Begin 6x avg hashrate mining if difficulty (1/target) is 130% of average difficulty that the 1x hashrate miners would have if there was no on-off mining, and stop when it is 135%. I also ran other tests such as start and stop on 95% and 105%.

Latex for the equations:

\text{t = timestamp, tar = target, T = desired blocktime , h=height}\\
\text{st = solvetime, i.e.}  \ \ st_h =t_{h} - t_{h-1} \\
next\_tar = prior\_target\ *\ \frac{t_N-t_1}{NT} \text{ (BTC)} \\
next\_tar = prior\_target\ *\ \frac{t_N-t_0}{NT}*\frac{N}{N-1} \text{ (BTC with 2 corrections)} \\
tar_{h+1} = \frac{1}{NT}\sum_{i=1}^{N} \left[tar_i*(st_i) \right] * \frac{N}{N-1} 
\text{ (Time-weighted SMA * Erlang correction)} \\
tar_{h+1} = avg\_N\_targets\ *\ \frac{t_N-t_0}{NT} \text{  (SMA, DGW's loop simplified) }  \\
\text{If past i blocks were too fast or too slow, reduce N to i in above SMA.  (KGW, BRNDF)} \\
tar_{h+1} = avg\_N\_targets\ *\ (1\ + \frac{t_{N}-t_{0}}{MTN} - \frac{1}{M}) \text{ (Digishield M=4, Grin M=3) } \\
tar_{h+1} = tar_h*(1+\frac{st_h}{MT} - \frac{1}{M})\text{  (EMA) } \\
tar_{h+1} = tar_h\ *\ (1\ +\ int(\frac{st_h}{T*ln(2)})*\frac{1}{M}\ -\ \frac{1}{M})\text{  (ETH with ln(2) for st accuracy) } \\
tar_{h+1} = avg\_N\_targets\ * \frac{2}{N(N+1)T}\sum_{i=1}^{N} i*st_i  \text{  (LWMA) } \\
tar_{h+1} = tar_h*\left[e^{(t_h-t_{h-1})/T - 1} \right]^\frac{1}{M}  \text{ (relative ASERT) } \\
tar_{h+1} = tar_H*\left[e^{(t_h-t_{H-1})/T - (h-H)} \right]^\frac{1}{M}  \text{  (absolute ASERT, H=beginning block height) } \\
sub\_tar_{h+1} = \text{SlowASERT}*\left[e^{(t_{h+1}-t_h)/T - 1} \right]^\frac{1}{P} \\ \text{ (TSA RTT with SlowAsert * Fast RTT ASERT)}

Discussion of why and in what sense ASERT is the "perfectly correct" difficulty algorithm. The other algorithms are just approximations to what ASERT does. EMA is very close to ASERT as can be seen by using the approximation e^x = 1+ x in ASERT to get EMA which is valid for small x (e.g. M>20). The corrected ETH algorithm is an integer truncation of the EMA that gives surprisingly acceptable results. ASERT was devised by Mark Lundeberg @markblundeberg (he'll publish a public PDF on it sometime). ASERT appears to be the ratio of the expected to the observed solvetime distributions. That is, in terms of targets, it's e^-1 divided by e^(-solvetime/T). There is also a "smoothing" power factor of 1/M to make it more stable (aka respond more slowly). Intuitively, the 1/M appears correct because adjusting the target every block uses multiplication that builds upon past multiplications of the roots of the ratios. ASERT's expected maximum rate of change is a factor of e in M blocks. LWMA rises about 50% faster and falls about 50% more slowly, which can be good and bad.

ASERT is the only algorithm that gives the perfectly correct solvetime no matter how slow or fast it is made to respond by adjusting the M factor, and no matter how much hashrate changes, except it gets behind M blocks for every 2.718x permanent increase in hashrate. All algorithms will have that type of except for a dangerous one that predicts increases in hashrate and thereby trying to adjust ahead of time. BTC/LTC can also get the correct long term average solvetime if a N/(N-1) correction factor in target is applied, but it is not as accurate on a per block basis because it is not changing every block, and there does not appear to be a valid adjustment for N=1 (a division by zero) that ASERT can do. The N/(N-1) is an explicit correction factor. All the algos can similarly get the correct long-term average solvetime if a correction factor based on N and/or solvetime is applied, but this appears to be approximating ASERT. Also, all the algos that use more than just the previous target like ASERT, EMA, and ETH will give a different result if there is an attempt to apply the inverse of the equation directly to difficulty. To get the same result they have to use the harmonic mean of difficulties which gives the mean target. These are my observational and pseudo-theoretical argument for why ASERT is the only mathematically correct difficulty algorithm, assuming we do not make assumptions or predictions about miner motivation.

UPDATE: A new algo "CDF-EMA" is mathematically more pure than ASERT and it may be better in a strict sense. The image below shows an error signal that it uses which is mathematically better than ASERT's "1-t/T". It's better at preventing on-off mining from getting excess rewards over dedicated miners but at a cost of making the average solvetime a little longer during on-off mining. It's mathematically more pure because it takes probabilities into account. It works at a micro level where ASERT works on a macro level. ASERT targets an average solvetime, overshooting the individual estimate of hashrate (and thereby the adjustment) when a solvetime is fast, and undershooting when it's slow. At the micro level, we have 1 sample per block, so we "expect" the median solvetime which is when CDF=0.5 which is a solvetime of t = ln(2) * T (where t= solvetime and T=block time). The CDF (of the exponential function in our case) is a uniform distribution that we can use to measure how much the solvetime was unexpected. Since it's uniform, it's a linear adjustment in the "space" of P. It maps the nonlinear t's that we expect from our previous estimate of hashrate to a linear space, making it an excellent error signal for adjustment. I tested many different ways of using this error signal and the one below is simplest and most stable. The "3" in the exponent is an approximate value that makes its level of filtering about equal to ASERT's (it could be changed to "1"). The "3" shows it needs less filtering (a smaller effective N) to get the same level of precision in average solvetime without even targeting average solvetime over a longer time ("macro") like ASERT does. For small filter values (small N) CDF-EMA has a small error in the median and mean where ASERT has 50% error in the median while targeting the mean perfectly. Given the CDF of any distribution, this equation is a good method of prediction and control. A concerning drawback is that it assumes timestamps follow the exponential distribution. For example of how this is a problem, if every miner selects a timestamps such that t=T, the difficulty gets easier & easier. The consequences of this are complicated but not necessarily a problem.

image

Update 2 I discovered a surprisingly simple algo that works as good as the best. next_target = prior_target * (0.56 * T/t)^(1/N) N is on the order of LWMA's N (see section on how to make the algos have the same stability). Since prior solvetime t is in the denominator, monotonic timestamps have to be enforced to prevent divide by zero or an imaginary result.

A Time-Based Algorithm This is an algorithm based on a time window instead of a block window. It adjusts once every N blocktime. Ideally it would use RTT with tight timestamp rules. D = prior_D (C/(N-1))^(1/M) where C = blocks observed in past N blocktime seconds from a miner's local clock that he uses as the timestamp (in RTT he sets his own difficulty by his timestamp). For example, if N=6 for blocktime=10 minutes it means difficulty changes once every hour on the hour, enabling miners to affect their difficulty within the limits of the 1/M smoothing factor. The unusually wide range of timestamps BTC allows can't be used with this. Instead of 7200 s into the future and about 1 hour in the past (the MTP) like BTC, the allowable timestamps could be +/- 10 s from local time but few believe it (see my article on selfish mining)). An alternatively to the miner using local time (to avoid it being an RTT) is to use the prior timestamp. The prior block would not count as part of N, but its difficulty would be used as the prior_D. The results aren't as good. To partially alleviate this, the adjustment could be done only once per N blocks. The N-1 is because "the poisson is memoryless" both forward & backwards in time (credit Jacob Eliosoff). For example, if you pick any 2 blocktime range at some random point in time, the average number of blocks you'll see in the window is surprisingly 1 (it's because long solve times have a large effect). For 3 blocktime the average number will be 2 blocks. Similarly, pick any random point time and the expected time between the 2 blocks on either side of that time is 2 blocktime.

end updates

cryptozeny commented 4 years ago

good to know! BTW, what about RTT?

zawy12 commented 4 years ago

I see you're keeping up with things. So far I have everything on that in my TSA article that I keep updating. Any of the above can use solvetimes that are shifted forward 1 more block to be Real-Time-Target algos. This article is to give an overview of DA's prior to RTTs.

ghost commented 4 years ago

fwiw, if you're interested in completeness, you may want to include Cryddit/Ray Dillinger's MIDAS: http://dillingers.com/blog/2015/04/21/altcoin-difficulty-adjustment-with-midas/

zawy12 commented 4 years ago

I see Shield and Verge switched from Midas to LWMA. It looks like maybe 1 or 2 coins are using it. There is another one I know of "OSS" that 1 or 2 coins use. I just wanted to cover the big ones and ASERT and TSA as they are the only better options. Midas's motivations and ideas are good but there are at least 6 different parameters it uses (9/10, the 5 to 17 choice, 5/8, 2/3, and 6/5) where all the other algos have only 1 or 2 parameter because their math basis is clear. Digishield has 2 parameters (17 and 4/3) and KGW has 2 (0.7084 and 1.228). It would be nice to reduce it. It uses two ideas I never tried. One is like ASERT in that it makes a more aggressive correction if the long-term average solvetime is incorrect. Unlike ASERT, the method looks like it could be manipulated in on-off mining and result in oscillations. It's also interesting in the way it requires both a long and short term avg solvetime to be a little fast or slow before it makes a correction. But the way it does it could be manipulated in on-off mining, causing oscillations. For example, an on-off mine could be 3 blocks on and 3 block off and get up to 33% more blocks than target without causing any adjustment in difficulty.

ghost commented 4 years ago

Thanks for taking the time to analyze MIDAS' position and status within the current context of development and thanks also for being explicit about the parameter criteria, I found that to be both illuminating and useful.

glasgowm148 commented 3 years ago

https://eprint.iacr.org/2017/731.pdf

mochimodev commented 3 years ago

I think I mentioned previously that in Mochimo (the cryptocoin I designed from scratch and launched in 2018) we solved all the epoch-based difficulty adjustment attack surfaces. We did this using per-block difficulty adjustment complemented by a mid-block difficulty adjustment algorithm.

Our results over that last 300,000+ blocks on the production network have brought a tightly-distributed mean solve time with vastly fewer outliers (lower St.Dev), and rapid recovery from near-total hash power loss. In one early instance the network lost 95% of its hash power and fully recovered within an hour and 40 minutes, resuming the minting of on-target blocks immediately thereafter.

So don't neglect analysis of what we've done.

On May 23, 2021, at 7:48 AM, Mark Glasgow @.***> wrote:



https://eprint.iacr.org/2017/731.pdf

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/zawy12/difficulty-algorithms/issues/50#issuecomment-846549996, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIDXEI3PSK3XBF4EKRSE7N3TPDTPLANCNFSM4JW6N2MQ.

zawy12 commented 3 years ago

I tried various methods of correcting a simple moving average for the slope which is like applying a least squares, but I never tried a full-blown least squares. I could never get it to work out better than a simple moving average.

How can the least squares method in the paper work without using any timestamps? I can see how it can work for the where the slope is constant, but not if the slope should change over time which will be determined by timestamps which tell us what the hashrate is.