zawy12 / difficulty-algorithms

See the Issues for difficulty algorithms
MIT License
107 stars 25 forks source link

BCH needs a new DA #47

Open zawy12 opened 4 years ago

zawy12 commented 4 years ago

Summary: This discusses how bad oscillations can develop in simple moving averages. There is a follow-up article. See also these tweets for how the oscillations can be stopped by changing target during the block instead of changing to LWMA.

Update: in this article I mention the SMA oscillations are primarily due to not taking a slope into account. In testing that does seem to make the oscillations more chaotic (less like simple oscillations) and it does end the attacks quicker, but if miners do not have a high cost of switching coins, it's a worse algo because it drops to an attractive difficulty a lot more often, decreasing the benefits.

BCH uses a simple moving average (SMA) difficulty algorithm (DA) with N=144, which is a 1-day averaging wnidow for its T=600 seconds. Monero/Cryptonote clones suffered greatly from a similar long average, but that DA also had a "cut" and "lag" that prevented it from considering approximately the most recent approx 1.5 hours of block timestamps. Other issues in this series show how bad they were.

Forced delays in response cause positive feedback => bad oscillations

Monero/Cryptonote clones were suffering because if a situation develops where there is a sudden increase in hashrate (such as a price increase & there is a large source of hashrate external to your coin), then a lot of miners jump on and it takes the DA a long time to respond. When they leave from the difficulty getting too high, the difficulty is supposed to start dropping right away from solvetimes getting longer. But if there is an additional delay in looking at most recent solvetimes like Monero/Cryptonote's DA, it will keep rising, possibly above its pre-attack starting point, causing even your "dedicated" miners to leave. Eventually hashrate will drop but due to not seeing the loss of miners right away, it will also overshoot the downside like it did on the upside. This will cause even more miners to come back to mine, and the oscillations can "blow up" due to the positive feedback loop.

Although this chart is hard to see, it is a typical Monero/Cryptonote clone "blow up". Every small clone had to abandon this 24 hr SMA that had the "cut" and "lag" delays. The black line is difficulty. Note that it is 1 cycle per 24 hours. Blue are delays, magenta is hashrate increase.

image

Asymmetrical adjustments to difficulty can be harmful (remember EDA?)

This is kind of off-topic since the new DAA does not have asymmetrical adjustments like the previous EDA. Delays from being slow (not just from the CN-type forced delays above) can make oscillations from asymmetry in difficulty adjustment much worse. But even if there is no delay, asymmetrical adjustments will result in too fast or too slow coin emission, depending on "which side" of the adjustment has the asymmetry. Asymmetry can be defined as taking the "expected value" of the PDF of the algorithm and not getting the target solvetime (credit to Tom Harding @dgenr8 for showing me this) but this may not give reliable results if the algorithm changes itself (the PDF changes) based on an unpredictable hashrate, as did the EDA. Even BTC is "asymmetrical' in this sense because calculating the expected value to get T=600 assumes the hashrate is approximately constant, or moving up and down around an average, not always increasing like it has been. This has resulted in 6.5% faster blocks on average ( there are 2 other very minor errors in it ). As many recall, BCH's previous DA before late 2017 was the "EDA" that tried to drop a lot if the chain appeared to be getting stuck with long solvetimes. It did this because it was forking from BTC, expecting large decrease in hashrate. The big drops caused a sizeable influx of hashrate, and because it kept BTC's N=2016 adjustment (not a rolling avg) it was a long time before the DA knew it needed to increase hashrate. BTC's DA does not have the same kind of delay described above for default CN coins. It does not simply ignore the most recent timestamps. When it changes, it uses the most recent timestamp, but it has a different type of very long delay in responding. The avg solvetime is what it was 2016/2 blocks in the past because it makes not adjustment for slope, causing it to be 6.5% too fast in coin emission due to hashrate generally increasing every 2 weeks. This made the asymmetry problem in BCH's previous EDA much worse than it needed to be.

BCH's new DA is not asymmetrical but has a mild forced delay

BCH does not have a "forced" delay like the default CN coin, but it seems to have a 3-block delay. If there is a long solvetime, it seems to take 3 blocks to start dropping. It explicitly has only the 1-block delay from using the median of past 3 block timestamps. This was a neat but sort of a good and sort of a bad decision trying to prevent most out-of-sequence timestamps from indicating a negative solvetime that could be up to 6 timestamps in the past, via the MTP(11) setting. The 2nd delay is what almost all coins have: the DA does not use the current solvetime in its calculation, but the previous block's solvetime. My TSA article) discusses changing it during the block. KMD (next fork), Mochimo, Coin2 (research only), and Xchange (defunct) are doing it, about to, or previously did. I seem to see a 3rd delay that may be the way block explorers are reporting data, so it's possibly not a delay, or the pool and/or miner code is using the start time of mining as the timestamp on a block instead of when the block was solved. CN coins do this, and it may be occurring in Zcash coins.

Taking the Slope of Hashrate into Account

In various places in this article I mention the importance of SMAs not taking the slope of the solvetimes into account. This helps get the current hashrate estimate Without it, an SMA is estimating what the hashrate was at 1/2 it's window in the past. This is 1/2 a day for BCH. All good algorithms in one way or another are trying to estimate current hashrate without over-shooting the estimate (which would cause oscillations). I call this "taking the slope into account". "Not overshooting" can be seen as a passive as opposed to active controller. An active controller should not be used because miners can actively look at difficulty and change hashrate in a non-linear fashion while a "non-A.I." active controller (one that does not actively learn miner motivation over time and change itself accordingly) has to make non-changing assumptions about miner (hashrate) motivation which miners will promptly deviate from to profit, at least at the expense of other miners if the DA is good at keeping coin emission rate correct. In summary, a non-smart active controller on a re-active system can blow up, but an estimate of current hashrate without making assumptions about miner motivation is safe, needed, and in all good DAs (not in SMAs like DGW). Due to the 6-block MTP delay in the default Digishield in estimating hashrate and it's N=17 part being an SMA average, it's about 6+17/2 = 15 blocks behind, and constantly suffers 15-block attacks with resulting mild oscillations.

Why does BCH have bad oscillations?

Despite no asymmetry, no sizable forced delay like CN, and being a rolling average, BCH is having bad oscillations. This is because miners have (probably unconsciously, see below) found a beneficial oscillating pattern based on the SMA window width. As the fast solvetimes of the previous hash-attack roll out the back of the averaging window, the difficulty will not change if there is currently another attack that has fast solvetimes. The solvetimes rolling out the back cancel the ones coming in the front. If there were not an attack, then difficulty would start dropping even if current solvetimes are correct, due to the fast solvetimes going out the back. In short, a 24 hour SMA difficulty is basically looking at difficulty how it was 12 hours in the past. so it has a delay that can cause oscillations. In BCH's case, bad oscillations are about 1/2 a day, 72 blocks, lasting 20 to 30 blocks, then stopping for a few blocks with long delays, difficulty dropping some, then a short attack, then a few slow blocks, then another big attack about 72 blocks after the previous attack. That is the pattern seen now in September 2019. Back in July 2019, there were some oscillations about half as bad, and it was a 144-block cycle.

bch2

image

Modelling the Above Attack Oscillation

The following is modelling that simply says "if difficulty is 1% below avg, then begin mining with 6x avg hashrate and stop when it reaches the average." Notice this simple logic discovers the oscillations and the attacker has about 5% lower-than-average difficulty ("sudden" ~10% drops appear when the previous fast attack solvetimes roll out the back of the window) while the miners staying after the attack have about 5% higher-than-avg difficulty (for that attack=off time period).

Clipboard06

Below is the same miner motivation scenario with LWMA DA instead of the SMA N=144, which takes the slope into account, so solvetimes rolling out the back of the window do not cancel the ones entering the front, preventing the oscillations. It's 3x faster in rising, but this is not necessarily a benefit because it drops fast too, so the attacks are coming much more frequently in LWMA than above. The attackers may be able to get the same profit as before (since they should be able to change what coin they are mining very quickly), and the dedicated miners lose the same. But they are stopped so quickly, there are no long delay problems (BCH is only getting 3 blocks per hour or worse when the attacks stop).

Clipboard07

markblundeberg commented 4 years ago

For that last graph, what did you use for LWMA window size? I think you need more than 144 to match the same 'averaging material' as SMA-144. A factor of 2x longer would be needed to keep the largest weight (the most recent block) the same. Or, you can try to just match variance in which case sqrt(2) longer would be enough.

zawy12 commented 4 years ago

@markblundeberg It was N=60. Your math is good because I had noticed SQRT(2) does actually give about the same std dev for constant hashrate. I'll make it a little higher at N=220. The spreadsheet I used for the above can't do the N=220 for LWMA, so I have to use a different method.

image

image

Expanded out some is below. The blank at the beginning is a startup, 2*N.

image

image

zawy12 commented 4 years ago

Here's a newer image to show the BCH miner motivations better.

BCH_10-27-2019_b

markblundeberg commented 4 years ago

image

image

Neat! It seems quite clear that the steady miners aren't being punished nearly as much in LWMA. So:

zawy12 commented 4 years ago

An interesting aspect is that BTG with LWMA N=45 is a lot more stable in difficulty than smaller coins with LWMA N>60 because it is more liquid and big miners are more keen to mine it when difficulty drops a little low, preventing it from dropping lower, and leave if it get a little high, preventing it from getting higher. Std Dev in difficulty with N=60 with constant hashrate modeling is about 0.13. In BTG it is about 0.09.

I was remembering wrong. Std Dev in difficulty is the same when N is the same.

image

Under attack conditions, you can see the Std Devs decrease

image

In the text above the images you can see my start and stop conditions were they being when the difficulty is 30% above the difficulty that "dedicated" miners would have had, and stops when it's 35% above. That could use some tweaking to be more like BCH's current conditions.

zawy12 commented 4 years ago

The code that did these is "test_DAs.cpp" in my code area.

markblundeberg commented 4 years ago

I was remembering wrong. Std Dev in difficulty is the same when N is the same.

Interesting, I hadn't actually done the math and was making an educated guess. Still it seems odd to me, I will do the math now. :-)

markblundeberg commented 4 years ago

Oh OK, did the math now, it should be

LWMA-N stddev: sqrt((4/3) / N) SMA-N stddev: sqrt(1 / N)

So if I have it right now, it's only a sqrt(4/3) factor, not sqrt(2). Note that I'm assuming big-N limit: this is based on calculating the variance in the average solvetime only, assuming the difficulty is reasonably constant (so solvetimes are IID); also I've approximated terms like N+1 as N.

I don't know why in your case the standard deviations are so much closer. :thinking: Perhaps the next-order term inside the square root, of order 1/N^2, is significant even at N ~ 144. Also though your sample size isn't huge, could just be the noise.

Anyway this makes it even better for my last point, which is that LWMA is more recent-responsive (2/N) than SMA (1/N), even for equivalent variance (e.g., LWMA-192 vs SMA-144, if I have the math right).

zawy12 commented 4 years ago

Difficulty is a recursive equation, so I have not dared attempt it. 100,000 runs on 192/144 gives 0.072 vs 0.086 and 0.73 and 0.82 on second run of 100k. Equal N still gives really close results at 100k.

zawy12 commented 4 years ago

Your difficulty algorithm might be to LWMA what LWMA is to BTC's crappy algo. I'll do an article on it sometime.

markblundeberg commented 4 years ago

Haha, I am honored you say that. I thought it was just a funny toy algorithm but yeah it has some interesting properties.

zawy12 commented 4 years ago

I expected LWMA to be equally stable at around 200 because I had noticed that LWMA at 60 has same stability as SMA at N= 45 which is what your results are predicting. Next time I do a comparison I need to look for any possible I may have made..