zawy12 / difficulty-algorithms

See the Issues for difficulty algorithms
MIT License
107 stars 25 forks source link

EMA for BCH (part 2) #62

Open zawy12 opened 4 years ago

zawy12 commented 4 years ago

In a previous issue I discussed this in a different way, received feedback from Mark L, and discussed the possibility of RTT for BCH. This issue discusses everything that came up in a live stream and everything I know of surrounding changing BCH's difficulty algorithm (DA).

My process to find the "best algo" is this: find a window size parameter "N" for each algo that gives them the same standard deviation for constant HR. This indicates they accidentally attract on-off mining at the same rate. Then simulate on-off mining with step response that turns on and off based on a low and high difficulty value. I have "% delayed" and "% blocks stolen" metrics. Delay: add up a delay qty that is (st - 4xT) for each block over 4xT. Divide that by total blocks to get a % delayed. For "% stolen" I divide time-weighted target that the on-off miner sees by that of dedicated miners. It's how much easier his target was, while he was mining. I just now re-tested 5 DAs. Except for SMA-144, the other 4 below are pretty much the same. WT-190 has slightly lower % stolen but slightly higher delays. (WT-144 is what I kept mistakenly calling wtema in the video). Delays and % stolen are usually a tradeoff. I had been penalizing WT-144 in the past by giving it the same N as LWMA when it needs a larger value.

SMA-144 WT-190 (as opposed to wt-144) LWMA -144 ASERT / EMA - 72 [ update: See my 2nd comment below. In Jonathan's code, this value would be called 72 * ln(2) ]

I'll advocate the following for BCH in order of importance.

There's also sequential timestamps which might be the hardest to do in reality, but important on a theoretical level. It might be possible to enforce sequential timestamps by making MTP=1 instead of 11. Maybe this could prevent bricking of equipment. Sequential stamps (+1 second) must then be able to override local time & FTL. Overriding local time to keep "messages" (blocks) ordered messages is a requirement for all distributed consensus mechanisms. MTP does this indirectly.

Antony Zegers did some EMA code, using bit-shift for division.

EMA is a close enough approximation to ASERT, but here's Mark's tweet on how to do accurate integer-based e^x. https://twitter.com/MarkLundeberg/status/1191831127306031104?s=20

wtema simplifies to:
target = prev_target (1 + st/T/N - 1/N) where N is called the "mean lifetime" in blocks if it were expressed as ASERT, target = prev_target e^(st/T/N - 1/N) The other form of ASERT should give the same results, but the other form of EMA could return a negative difficulty with a large st. [ wtema-144 and asert-144 refer to N = 144 / ln(2) = 208.

If wtema is used: Negative solvetimes (st) are a potential problem for wtema if N is small and delays are long and attacker sends an MTP timestamp. See wt-144 code for how to block them. To be sure to close the exploit, you have to go back 11 blocks. Do not use anything like if (st < 0) { st = 0 } in the DA which allows a small attacker to send difficulty for everyone very low in a few blocks by simply sending timestamps at the MTP.

Std Dev of D per N blocks under constant HR for EMA/ASERT is ~1/SQRT(2*N). This is important for deciding a value for N. If HR increases 10x based on 10% drop in D, you definitely want N > 50 or the DA is motivating on-off mining. On the other hand, being at the edge of motivating on-off mining can reduce the expected variation in D. BTG's Std Dev is about 0.10 when it would have been 0.15 if HR was constant (N=45 LWMA, N=22.5 in ASERT terms). So it's paying a little to on-off miners to keep D more constant. ASERT with N=72 is same as SMA=144 in terms of D variation when HR is constant.

On the other side is wanting a sufficiently fast response to price changes. As a starting point, for ASERT N=100 it takes 6 blocks to rise 5% in response to a 2x increase in HR. This seems sufficiently fast. Std Dev is 1/SQRT(2*100) = 7% per 100 blocks which seems high. 5% accidental changes in D seems to motivate substantial HR changes, so maybe N=200 is better which will have 5% variation per 200 blocks. But this is like LWMA N=400 which would be incredibly smooth and nice if it works but this is 3x larger than my experience. BTG is very happy with LWMA N=45 with Std Dev 10%. Notice that there's bigger bang for the buck with lower N due to the 1/SQRT(N) verses N. You get more speed than you lose stability as you go to lower N. So instead of 200, I would go with 100 or 150.

None of the 60 or so LWMA coins got permanently stuck. About 5% could not get rid of bothersome oscillations. When sending the same random numbers to generate solvetimes, LWMA / EMA / ASERT are almost indistinguishable, so I do not expect them to get stuck either. Most LWMA's are N=60 which has the stability of EMA with N=30. N=100 would have probably been a lot nicer for them, if not N=200 (ASERT N=100). They are mostly T = 120 coins, so they have very fast response, reducing the chances of getting stuck.

Mark was concerned about forwarded timestamps sending difficulty very low which could allow a small attacker to do a spam attack (he would begin with setting a far-forwarded stamp for EMA/ASERT for submission later). To make this a lot harder to so, use st = min(st, 7*600). So long delays will not drop the difficulty as much as "it wants to". Most LWMA's do this as an attempt to reduce oscillations but it may not have helped any.

Combining upper and lower limits on difficulty, timespan, or solvetimes like the 4x and 1/4 in BTC, and the 2 and 1/2 in BCH is what allows unlimited blocks in < 3x the difficulty window. I'm the only one I know of that has described it. A couple of coins following me were the first to see it, losing 5,000 blocks in a couple of hours because I had the hole in LWMA but did not realize it. Some LWMA's had symmetrical limits on st, not just the 7xT upper limit. Removing the 1/2 limit on timespan in BCH will prevent it.

If FTL is dropped, the 70 minute revert from peer to local time rule should be removed (best) or set it to FTL/2 or less. It should be coded as function of FTL. There are a couple of exploits due to Sybil or eclipse attacks on peer time.

A pre-requisite for BFT is that all nodes have a reliable clock that has higher security than the BFT. The clock in POW is each node operator knowing what time it is without needing to consult the network. Any node can unilaterally reject a chain if its time is too far in the future so it's impervious to 99.99999% attacks. (Although current mining on honest chain needs to be > 50% HR of the forwarded cheating chain for the node to continue to reject the cheating chain because time going forward can make the cheating chain become valid.)

jtoomim commented 4 years ago

@dgenr8 ema-1d has severe problems and wtema has mild problems in the dr100 and ft100 scenarios. The rest do well in all of the coded scenarios. Here's ft100 for 5k blocks:

Algorithm Block interval (sec) Confirmation time (sec)
asert-144 597.10 1053.71
asertd-144 596.65 1000.91
aserti1-144 595.00 1044.35
aserti2-144 597.05 1052.13
aserti3-144 597.10 1015.62
ema-1d 315.09 460.31
ema2-1d 590.86 1054.35
emai-1d 577.02 1081.78
emai2-1d 592.37 1107.30
simpexp-1d 597.07 1041.32
simpexpi-1d 597.15 1042.07
wtema-144 656.22 1320.96

The current variable hashrate allocation algorithm doesn't handle 10x changes in price very well. I'll need to modify that first before I can test that. I don't expect any gotchas.

With ASERT, if mining becomes extremely unprofitable, a severe dry spell can be broken by only one block. In the worst-case scenario, if the dry spell lasts long enough, there can actually be a specific and strong incentive to rescue a stalled chain. For example, with tau = 144*600(1 day) -- roughly N=208 with the numbers we've been using so far for a power-of-2-not-e ASERT -- the difficulty would halve after each day that passes. If the price drops by 75%, you would need to delay cumulatively by 2 days in order to return to equilibrium. After 2 days have elapsed, the difficulty will start to overshoot equilibrium. If miners allow 54 hours to elapse, that means that blocks would be 19% more profitable than equilibrium for a while. This means that a miner could mine the 1st block at a loss, and then selfishly mine another 4-36 blocks in rapid succession at a profit. If they amortize the cost of the first difficulty-resetting block over the next 4 blocks, they break even on the 4th block, and everything after that is profit.

As for RTTs: the kyuupichan difficulty simulation engine can't handle RTTs. I also don't think RTTs are necessary or helpful: a heavy RTT would increase orphan rates, and a mild RTT would solve a problem (hashrate abandonment) that I don't think BCH will have. So I'm not personally interested in investing my time in developing them. If there are going to be RTT proposals for the Nov 15 hard fork, they will have to be championed by someone else. Maybe you?

jacob-eliosoff commented 4 years ago

Wow, yeah, ema-1d is really crapping its pants in the ft100 scenario (I see it too). I wonder why that is... Probably the following line? Anyway we can rule out that one for now.

block_time = max(IDEAL_BLOCK_TIME / 100, min(100 * IDEAL_BLOCK_TIME, block_time)) # Crudely dodge problems from ~0/negative/huge block times

I agree with Tom that handling of extreme/attack scenarios is more important than 600- vs 590-second blocks etc.

I don't have a strong view on RTT. I like the idea but I'd guess it's a radical enough change that 2020 is unlikely. It seems unnecessary for increasing difficulty, since excessively-low difficulty results in lots of blocks, so adjusting difficulty between blocks becomes less important. RTT seems most important for the sort of scenario Jonathan described, where hashrate evaporates and it takes ages to find even one block. I think RTT could really help there (anything near 54 hours without blocks would do a lot of damage to a coin) but again it would be a change requiring a lot more testing than asert & co.

jtoomim commented 4 years ago

@dgenr8 Making a HAA (hashrate adjustment algorithm) that quickly but stably responds to dramatic and sudden price changes is at least as difficult as making a good DAA, it seems. But here's a current WIP, with a simulation that increases price by 10x twice in a row, followed by a 4x decrease.

image

All of the DAAs perform reasonably. The HAA overshoots significantly, though, but in this particular case it's not too big of an issue because it quickly saturates the available 500 EH/s hashrate setting.

jtoomim commented 4 years ago

(anything near 54 hours without blocks would do a lot of damage to a coin)

Well, yes. But that's an unlikely scenario, and one that can only possibly happen when something else has already severely damaged that coin -- e.g. 75% reduction in price relative to other SHA256 coins. And this block drought can be broken by miners being lazy or altruistic and mining at irrationally high hashrates, or by users sending a large volume of medium-to-high-fee transactions to sit in mempool and entice miners to mine. Once transaction fees are the dominant form of mining reward, a 4x reduction in relative pricing will only result in a 4x reduction in block rate -- transaction fees will still accumulate in mempool, but blocks will only be mined when those transaction fees exceed the opportunity cost of mining.

dgenr8 commented 4 years ago

@jtoomim Thanks for looking at the larger shocks. To be clear I just meant using the price as a mechanism within the sim. The actual reason could be something else, such as a deliberate attack.

zawy12 commented 4 years ago

This is to finish up on modelling miners. Define "miners of the same mind" (MOSM) as those who have the same USD cost per hash C (per hash in terms of D), stickiness S (resistance to switching coins), and preference factor P for BCH. They switch their total HR on or off if the miner motivation factor M goes above 1+S/2 or below 1-S/2. S and P are expressed as fractions of M.
M = (BCH_USD_revenue/BCH_D - C) / (BTC_USD_revenue/BTC_D - C) - P A "single" miner who varies his HR as some non-step function of M is in a way unable to make up his mind, so he's treated as belonging to several different MOSMs, approximated by the step functions. The total HR of a coin is determined by adding up the HR of all the MOSMs. If all miners are the same MOSM with zero S, they are the constant HR case. If they are all the same with wide stickiness they are the greedy (step function) case that causes the worst oscillations. If all the MOSMs have the same HR, C, and S, and different slightly increasing P's as shown in the image, they approximate a pair of S-curves.

This method will create a wide variety of paired S curves based on the MOSMs that simplify to the constant and greedy cases in the extremes.

The top left plot is how I have always modeled on-off mining. The top right plot is 3 different MOSMs with slightly different Ps, 0, 0.01, and 0.02 that add up to give the S-curve at the bottom. C2 should be C1.

image

If the DA is "faster than the stickiness", the summation of the sha256 HR can be cut off at the top and bottom of the hysteresis curve by making M rise or drop before all their trigger points are reached. So a slow DA that is not reacting fast enough to keep up with price ramping up or down over 24 hrs should trigger more HR to come on or off. A large dedicated hashrate would make it more likely for the bottom flatness to be reached and M=1 would not be the avg M. M=1 only occurs in the infrequent case of the curve being symmetrical.

If you know S, C, and P for all the different levels of hashrate, you can add up all the HR's for all the MOSMs with

foreach MOSM i { 
    if ( mine[i] == off && M[i] > 1+S/2) { HR +=HR[i]; mine[i]=on; } 
    else if { mine[i] == on && M[i] < 1-S/2) {HR -= HR[i]; mine[i]=off; }
}

The lazy way would be to do the above if/else with the logistics curve or something else. M0 in this case is 1, but if the DA is faster than the S-curve's aggressiveness, that will not be the observed average M. HR = Ho/(1+e^(-A*(M0+1 +/- S/2)) where +/- select one of the 2 S-curves based on the above if/else.

zawy12 commented 4 years ago

@jtoomim Have you modeled a 24 HR 10% ramp up and/or down in BCH price? I would think ASERT-144 would be the clear winner in that case. Have you tried historical exchange rate as the input? This could test my old theory that the DA's Std Dev should match the exchange rate Std Dev. Maybe it should be 1/2 to 1/10 that of the exchange rate. The results will still depend on the aggressiveness and coin-stickiness factors. Maybe they could be estimated by modeling BCH's DA using the historical exchange & difficulty data and adjusting them until you find the combination that best matches the oscillations.

Std Dev might not be the right metric because the main simulation above might have a fairly large Std Dev but it changes rev_ratio ~10% to 15% only once per 2 to 5 days that may be biased towards ASERT-576 (although, if that's more like live data, then it should be). Live data might be changing less like 5% but every 6 hours (I don't watch exchange rate) so it would have smaller Std Dev, but more like the case that I think a faster DA might like.

RTT, BTW: KMD (jl777) and I did a very aggressive version of an RTT riding on top of a slow DA. They needed it because some of the coins that use their BTC notarization for double-spending security were having problems from up to 1 million times increases and decreases in HR. I think they deployed it and it solved their problem.

zawy12 commented 4 years ago

@jtoomim When I run your simulation, 50k blocks like you used seems to be some kind of sweet spot for ASERT-576. Significantly more or less has ASERT-72 winning instead in both confirmation times and profits/losses.

Also LWMA and WT are winning if I don't use 50k blocks. I would think it's an error in the code. I remember you said something about the online code having a problem. I'm using 500k blocks and 5x or 10x HR for variable and greedy like we see on BCH. Average solvetime for ASERT can be high like 650 and even 800, so it makes me wonder if there is a big problem. I think even my 2% error in avg solvetime is my mistake of having a beginning D that was 30% lower than steady state (ASERT is suppsoed to get behind for each permanent HR increase).

jacob-eliosoff commented 4 years ago

This is just to flesh out the history of my own thinking on these EMA-type DAAs, since this came up a couple time on the livestream. This is leaving aside for a moment the question of which is best, but just to understand the history.

I first took interest when I saw BCH was considering DAAs based on "simple" (fixed-window) moving averages (SMAs), because I have a strong bias from my own work (building an algorithmic trading system, where robustly estimating things like "recent BTC market price" is crucial) that SMAs are Bad and EMAs are Good.

My original approach

...was as follows, in Oct 2017: ema. Note that this is a simple algo with exponentially-decaying weights, like @markblundeberg's asert, but not quite the same as (and not quite as simple as) asert!

  1. We want to calculate the difficulty that implies a 10-minute expected block time for the current hashrate. This is a simple equation if we know the current hashrate.
  2. A good (the best?) way to estimate current hashrate is to assume it matches the EMA of recent hashrate.
  3. A crude-but-reasonable way to estimate recent hashrate is from block times. This gets more technical, but the main idea is:
    • If we specified a difficulty D when block n came out at 3 pm, it was because at that time our estimate of current (also: recent) hashrate was H, an amount whose expected time to find a block at difficulty D was 10 minutes.
    • And if block n+1 came out at 3:02 pm, in 2 minutes rather than the expected 10, we can estimate that in fact the hashrate during those 2 minutes was not H, but 10/2 * H = 5H.
    • So we can calculate our EMA of recent hashrate as of 3:02 pm, based on the simplified assumption that it was "5H during the 2-minute period 3-3:02 pm, and H from the beginning of time until 3 pm."

This is all summarized in the comments for the original next_bits_ema() from Nov 2017, though the comments were added later.

Why not ema?

People like @zawy12, @dgenr8 and @kyuupichan helped me understand two important weaknesses of ema:

  1. Handling of pathological timestamps. A block (or two) whose timestamp is significantly before previous blocks, or significantly in the future, followed by other blocks with accurate (current) timestamps, shouldn't throw off the difficulty too much - much less cause the DAA to crap out. ema didn't handle these well.
  2. Floating-point math. If two nodes running the same code produce numbers that are even 1e-100 off, that can eventually lead to a consensus split. At first I was skeptical that this was a serious risk (does IEEE 754 not prevent this?), but further reading failed to reassure me - integer math does seem like the safest bet.

wtema and emai

Jan 3 2018: @dgenr8 added wtema. "Characterized by a single alpha parameter and no fixed window size. Results are similar to ema from @jacob-eliosoff with differences in the adversarial dr100 and ft100 scenarios.

Jan 4: I added emai. As I noted later, this linear approx was actually the first version I tweeted in Nov 2017, but @kyuupichan noted it broke on very long block times.

Jan 7: after some digging into the different algos on the table, my conclusion was that wtema was a better option than emai (or ema) - simpler, cleaner integer math, better performance (closer to ema).

simpexp

Jan 9: I added simpexp (and simpexpi), the algo that is actually the same as asert (though implemented recursively rather than absolutely). My reasoning behind it was:

That said - I still considered simpexp a pure but imperfect approximation of ema, the algo whose reasoning I actually understood. And wtema seemed almost as simple, almost as performant, and cleaner to implement in integer math. So I still leaned in favor of wtema, and to some extent still do! To me the key tradeoffs are:

  1. Whether there are problems, esp exploitable problems, in how wtema handles pathological timestamps. It does seem like asert (simpexp) handles them more-or-less optimally.
  2. How cleanly asert can be approximated in integer math (@jtoomim and I have taken more stabs at this in posts above): whether there are exploitable edge cases, and whether we can avoid multi-month bikeshedding of the many possible int approximations.
jtoomim commented 4 years ago

Have you modeled a 24 HR 10% ramp up and/or down in BCH price?

That's the fxramp scenario. That scenario only works over short block intervals, because it's a constant-rate exponential increase in price, and it will exhaust all available hashrate if it goes on for too long.

Trying to get meaningful benchmark data from that scenario is difficult, because everybody profits from the exchange rate going up, and the block interval is (and should be) less than 600 sec.

Have you tried historical exchange rate as the input?

No.

I remember you said something about the online code having a problem.

There is a concurrency issue. Running multiple simulations, or starting a new simulation before the last one has completed, can result in data getting mixed up during the simulation and the results being invalid. When this happens, the graphs will look strange -- it will look like the different algorithms are simulating completely different scenarios, because they are. Reloading the page in your browser to re-run the simulation is sometimes needed. This issue is more common on ml.toom.im:8051 due to the poor TCP performance of using a remote machine. Also, :8051 will limit the simulations to 20k blocks.

I'm using 500k blocks and 5x or 10x HR for variable and greedy like we see on BCH.

Did you change the code? Because if you didn't, you aren't getting 500k blocks.

https://github.com/jtoomim/difficulty/blob/comparator/dashdiffsim.py#L16

https://github.com/jtoomim/difficulty/blob/comparator/dashdiffsim.py#L242

On ml.toom.im:8051, I set MAX_BLOCKS to 20k instead of the normal 50k.

zawy12 commented 4 years ago

@jtoomim How do you measure confirmation time? Do you average the time to the next block, sampling every second? I'm interested in a DA that targets that instead of avg solvetime. It could be argued that confirmation time is the goal, not avg time. I think if a stable DA can be written for confirmation time instead of solvetime, it's the DA we should be using.

jacob-eliosoff commented 4 years ago

"Average the time to the next block, sampling every second" - by that definition wouldn't confirmation time just be half of block time?

jtoomim commented 4 years ago

How do you measure confirmation time?

Basically, sum(solvetime^2 for solvetime in solvetimes) / sum(solvetime for solvetime in solvetimes) / 2.

https://github.com/jtoomim/difficulty/blob/comparator/dashdiffsim.py#L423

Do you average the time to the next block, sampling every second?

Yes, that's what the formula above is equivalent to.

It could be argued that confirmation time is the goal, not avg time

Confirmation time would be optimized by making each solvetime exactly 10 minutes. Unfortunately, this would cause all sorts of chaos with the orphan rate.

Both are goals. The avg solvetime matters because mining incentives, issuance, and inflation are important. The confirmation time matters because UX matters.

"Average the time to the next block, sampling every second" - by that definition wouldn't confirmation time just be half of block time?

No. If you have one solvetime of 9991 seconds, followed by 9 blocks solved in 9 seconds, your average confirmation time is about 4444 seconds, but your average block time is 1000 seconds.

zawy12 commented 4 years ago

I do not mean to target a precise solve time. I mean measure and enter a confirmation time into maybe a regular DA equation instead of st. It might be that we have to change that input time to say ASERT once every 10 minutes instead of once every block.

jacob-eliosoff commented 4 years ago

No. If you have one solvetime of 9991 seconds, followed by 9 blocks solved in 9 seconds, your average confirmation time is about 4444 seconds, but your average block time is 1000 seconds.

Ahh right. So:

This would be somewhat off in the case where you can choose when to send, and are able to predict times (eg of day) when the blocks will probably come faster. But that's a detail.

jtoomim commented 4 years ago

I mean measure and enter a confirmation time into maybe a regular DA equation instead of st.

This is begging for timestamp manipulation attacks. A block that's 1 second after the previous block has basically no impact on average confirmation time. If you manipulate your blocks to alternate between 1 second solvetime and 1199 second solvetime, that makes the average confirmation time 1199 seconds, which would trigger the conftime-DAA to lower the difficulty by 49%. This allows miners to get more revenue than they deserve.

zawy12 commented 4 years ago

If you manipulate your blocks to alternate between 1 second solvetime and 1199 second solvetime, that makes the average confirmation time 1199 seconds, which would trigger the conftime-DAA to lower the difficulty by 49%.

Sampling every second means the time-weighted mean of for confirmation times for two blocks is (N1 (N1+1)/2 + N2 (N2+1)/2 ) / (N1 +N2) = 600 for your example

I mentioned 10 minutes to show we're sampling by time instead of blocks. This is like random samples, so it's good. The memoryless property (as Jacob used to quickly solve a PW tweet-puzzler) works as well backwards in time which means we can use time to previous block(s) instead of next blocks. 10 minute adjustment means it's partially an RTT. To reduce/prevent problems, we delay it 10 minutes and tighten FTL to 3 minutes. In Jacob's email example, this means 0.7%/3.33 max ability to cheat with a "fast" 144 (and 4x less for a 576 algo, which I have yet not found a problem with, but I'm trying to because it's so slow compared to anything alts have done).

I should mention time-based crossed my mind in 2016 and Amaury asked about it in a 2017 issue, wondering what it would look like, thinking it would be better. Just now thinking about ASERT, I realized he's right. Maybe we thought about it because adjusting every block that arrives at a random time is just wrong, especially since fast solves are more likely so they get more adjustments. But since ASERT average st is perfect, it must be doing a "wrong" kind of adjustment. It must be some kind of engineering hack. This was shown in the recent paper from imperial college, where the final correction they applied may not be needed. In a comment in my previous issue, I showed ASERT purposefully is not targeting the D that is correct, but aims for one that is 58% low. Now I see this is because it is adjusting for the faster solves more often because it is block based instead of time based. With a time-based DA, we can write a potentially perfect DA with the smoothing factor as the only thing left up for debate. It might be as simple as target = previous_target * ( time-weighted mean st of blocks since previous sample / T ) ^(1/N)

where N is say 144 or 576. I would sample every T seconds and do the calculation at top for each block, pretending there was a tx each second. But off hand I don't know how to handle the time gap between previous time sample and the first block that came after it, or how to handle it if there was no block.

dgenr8 commented 4 years ago

The last thing you want after a very long block is another very long block. While wtema and simpexp/asert are effectively identical for normal blocktimes, asert's behavior is superior for both abnormally long blocktimes (nonlinear response) and abnormally short (negative) blocktimes (asymptotic positive response). Nice job @jacob-eliosoff @markblundeberg.

wtema-asert

jacob-eliosoff commented 4 years ago

Interesting Tom, thanks, I'll have a look. High responsiveness to very long block times (cutting difficulty sharply) does seem important.

(It looks like the blue line crosses the axis which does seem bad...)

jtoomim commented 4 years ago

Yes, the crossing of the x axis is the -TARGET_BLOCK_TIME/alpha singularity issue which we've been mentioning. That's the reason why wtema needs protection against large negative solvetimes, and it has been the main reason why I've been investigating asert as a potentially simpler overall DAA consensus rule package.

jacob-eliosoff commented 4 years ago

Based on factors like @dgenr8's extreme-timestamp cases, I lean more and more towards one of the integer-math approximations of asert, but implemented "absolutely" rather than recursively: ie, calculated from orig_target rather than from the previous block's target. This is because in the case of successive timestamps like (30000, 30600, -1000000, 31800), most approximations implemented recursively will be doing (huge approximation * tiny approximation) and end up way off, whereas the absolute calculation should just get right back on track for the (normal-timestamp) 4th block.

So whereas exact asert does:

next_target = int(orig_target * math.e**((blocks_time - IDEAL_BLOCK_TIME*(height_diff+1)) / tau))

Absolute-approximate asert would do something like:

next_target = (orig_target * exp_int_approx(((blocks_time - IDEAL_BLOCK_TIME*(height_diff+1)) << 32) // tau, 32)) >> 32

(For some definition of exp_int_approx() - when calculated in this absolute way, the specific approximation may not even matter much.)

Happy to push a next_bits_asert_abs_approx() that does this.

jtoomim commented 4 years ago

Happy to push a next_bits_asert_abs_approx() that does this.

I already did that a few days ago. That's what aserti{1,2,3} are.

https://github.com/jtoomim/difficulty/blob/comparator/mining.py#L373

jtoomim commented 4 years ago

If you manipulate your blocks to alternate between 1 second solvetime and 1199 second solvetime, that makes the average confirmation time 1199 seconds, which would trigger the conftime-DAA to lower the difficulty by 49%.

Sampling every second means the time-weighted mean of two blocks is (N1 (N1+1)/2 + N2 (N2+1)/2 ) / (N1 +N2) = 600 for your example

Okay, fair enough. A deterministic 1, 1199, 1, 199 alternation is equivalent to the random poisson process in terms of average confirmation time. You need to go more extreme in order to decouple confirmation time and block time, such as by doing three 1-sec solvetimes followed by a 2397-sec solvetime. That gives you a 1198.5 sec confirmation time with a 600 sec average block interval.

jacob-eliosoff commented 4 years ago

I already did that a few days ago. That's what aserti{1,2,3} are.

https://github.com/jtoomim/difficulty/blob/comparator/mining.py#L373

Ah, right you are. OK, I pushed aserta anyway, and in the tests its numbers look marginally closer to asert's than aserti3's do, but any of them is probably fine (probably including aserti1?).

jtoomim commented 4 years ago

@zawy12 Here's why I think my testing shows superiority for time constants around 250-500 blocks, whereas yours does not:

Mining is competitive. A competitive market exists for the service of difficulty regulation. Different coins will have different markets with different degrees of competitiveness. Also, different DAAs will have different efficiencies for regulating difficulty with a given amount of hashrate switching. Large markets like BCH/SHA256 will have greater hashrate liquidity than small markets like Masari/CryptoNight.

The effective cost of difficulty regulation can be approximated as the revenue delta for a constant hashrate miner versus the maximally profitable miner. This cost will be determined both by the efficiency of the DAA as well as the behavior of the miners.

Your test setup models a very illiquid market. It makes hashrate switch on when profitability reaches X%, and switch off when profitability reaches (X-dx)%. Basically, you're using a thermostat-style hashrate adjustment algorithm (HAA). This causes difficulty to follow a sawtooth or triangle wave pattern. The slopes of the sawtooth are determined by the time constant and the amounts of hashrate. The period of this oscillation is proportional to the time constant. Except for quantization error, the amplitude (in terms of profitability) is independent of the time constant. Because you're only modeling one such greedy miner, you are effectively modeling a non-competitive market with fixed prices.

My test setup models a much more liquid market. It assumes an infinite number of distinct mining entities, and simulates competition between miners. It does this by allowing proportional allocation of mining -- it assumes that each miner has a different price threshold, and assumes that the number of miners that will switch over increases as both the profit margin and the duration increase.

DAAs with higher time constants allow the market's spread to fall over a narrower range. With asert-144, a single very slow block might cause the profitability to increase by 3%, which might be enough to trigger a very large and very greedy miner to turn on. But with e.g. asert-576, that same slow block would only increase profitability by 0.75%. That might only cause 1/4th as much hashrate to switch on immediately, but that extra hashrate might still be enough to mine the catch-up blocks and get profitability back to equilibrium.

In my simulations, there seems to be a big advantage of -288 over -144, but only a small advantage of -407 or -484 over -288. My opinion is that there are other advantages to short intervals which my tests do not quantify (e.g. speed of recovery after a 90% price drop), so it's probably best to aim for something at the responsive end of the performance plateau, like -288 (diff doubles for every 2 days ahead of schedule we get) or -316 (tau = 2^17 seconds for 2^x versions of asert, which is nice for the integer math).

jtoomim commented 4 years ago

Ah, right you are. OK, I pushed aserta anyway, and in the tests its numbers look marginally closer to asert's than aserti3's do, but any of them is probably fine (probably including aserti1?).

aserti1's inaccuracy does not seem to be particularly problematic. Still, I think I prefer either aserti2 or aserti3, since they're only slightly more complicated and the performance jump from 1 to 2 can be a few percent.

zawy12 commented 4 years ago

@jtoomim

Here's why I think my testing shows superiority

In my comments above you'll see where I mentioned I believe my testing had an error in estimating a 2% slower than avg st. The 2% was my only basis for saying ASERT-576 was not as good as ASERT-72 (specifically, my starting D was 33% below my steady-state D and 20k blocks was not enough to let 576 dilute the error). I have not yet gone back to see if this is the reason LWMA and WT were consistently performing slightly better. WT now matches LWMA (with a stolen / delay tradeoff) because I fixed my stolen metric earlier this year. So the comments you and Tom made about it in the video were correct (Tom defended WT and you mentioned LWMA could not be much worse due to D being fairly constant). My confused spiel in the middle of the video in response to a fat curve Tom threw me was all wrong. I went back and checked the complicated issue I was trying to explain and found an error in my code.

I also mentioned above I'm still trying to find an argument or specific miner motivation against 576.

In my simulations, there seems to be a big advantage of -288 over -144, but only a small advantage of -407 or -484 over -288. My opinion is that there are other advantages to short intervals which my tests do not quantify (e.g. speed of recovery after a 90% price drop),

These are my thoughts exactly (the mere 1% benefit of 576 over 288 at a cost of 50% loss in speed). But for small coins I've seen big miners stay on far longer than makes sense (they lose revenue compared to doing a reasonable on-off tactic), repeatedly coming back to make it go really high, causing long delays. ( You mentioned this type of outlier possibility at the beginning of your last post. ) For this reason I would not recommend that they go above ASERT-144. If it goes wrong, it's stuck until they fork. Notice that ASERT 288 has about 4.1% Std Dev in D if HR is constant which isn't a bad guess for the kind of BTC/BCH exchange rate days you want to be prepared for. In my first post I think I mentioned I have long thought that letting Std Dev of D equal the Std Dev of daily price changes felt intuitively correct. We get more speed than loss in stability per change in N due to 1/N speed verses SQRT(N) stability, so historically I've argued to err on the side of speed. But the higher accidental variation cause an "exponential" (or higher) increase in attracting on-off mining.

I never fully investigated DA's as slow as 1 day averaging because almost every small coin that did seemed to get stuck. But what I was looking at was CN SMA's that were even worse than regular SMA's by effectively removing about 10% of the most recent block times from averaging if HR changed due to its cut and lag.

Yes, my step function is a worst case scenario compared to your softer variable miners. But I believe you're mistaken in saying profitability is independent of time constant. I got about the same 1% reduction in switch-mining gains as you seemed to get in going from 288 to 576. (our testing conditions were different, so it's complicated)

You need to go more extreme in order to decouple confirmation time and block time, such as by doing three 1-sec solvetimes followed by a 2397-sec solvetime. That gives you a 1198.5 sec confirmation time with a 600 sec average block interval.

If there is a 1, 1, 1, 2397 cycle, there is an on-off miner. My goal is to target confirmation time because that's what users experience. It might be argued confirmation time is the goal, not coin emission rate. If I target confirmation time, I will be paying everyone to dilute the problem caused by the on-off miner. Targeting confirmation time might reduce the on-off miner's profits. I'm still trying to think of ways to do the DA. For example, apply ASERT with confirmation time only once per 2 blocks. Calculate confirmation time on 2 blocks at a time. If there are 2 blocks that come in 600 seconds, then regular ASERT's combined 2-block adjustment is always e^(1/N - st1/600/N) * e^(1/N - st2/600/N) = e^(2/N - (st1+st2)/600/N). In this case of st1+st2 = 600, this always comes to e^(1/N), the same as if each st is always 300 seconds. But with confirmation time on a pair of st's above, the st ranges from as if both were 150 each (the case of 1 and 599 seconds) to 300 each as in ASERT. So if they vary a lot, the difficulty is going to made more difficult. This may not work but it shows the confirmation time calculation can "see" more.

jtoomim commented 4 years ago

For this reason I would not recommend that they go above ASERT-144

Most small altcoins should aim for shorter intervals than large coins like BCH. Smaller coins generally have lower miner liquidity, less price stability, and fewer dedicated miners than big coins like BCH.

But this thread is about BCH.

Notice that ASERT 288 has about 4.1% Std Dev in D if HR is constant which isn't a bad guess for the kind of BTC/BCH exchange rate days you want to be prepared for. In my first post I think I mentioned I have long thought that letting Std Dev of D equal the Std Dev of daily price changes felt intuitively correct.

More specifically, perhaps the stdev of difficulty from constant hashrate should be roughly equal to the average daily price ratio change. Or perhaps the RMS daily price ratio change. While BTC/BCH ratio does sometimes change by 4%, that's far higher than the average daily change, which (based on a 3 minute glance at coinmarketcap) seems to be around 1.5% from daily low to daily high, or about 0.5-1.0% from day open to day close. Most of the time, when the BCH/USD price changes, the BTC/USD price changes by almost exactly the same amount.

If there is a 1, 1, 1, 2397 cycle, there is an on-off miner

No, I'm saying those are the timestamps, not the actual block intervals. I'm alleging that trying to do a conftime-based DAA makes the DAA highly susceptible to timestamp manipulation attacks. In this scenario, perhaps all 4 blocks took exactly 600 seconds to mine, but the first 3 blocks intentionally used timestamps that were 9.9, 19.9, 29.9 minutes in the past.

The average solvetime is information that can be used with reasonable confidence in potentially adversarial conditions. But the distribution of solvetimes is too easily manipulated by even a minority miner. Using conftime in the DAA makes the DAA sensitive to the solvetime distribution in a way that presents an attack vector.

My goal is to target average solvetime because that's the only data we can trust to not be a massive attack vector for dishonest miners.

zawy12 commented 4 years ago

That's a good point, but the manipulation to lower difficulty would also lower chain work, so it seems the attacker needs to have 50% HR (or maybe 33% selfish mining) to gain maybe 50% more blocks than the same attack if avg st instead of confirmation time is used.

There is another problem as a result of what you describe: if a 25% miner assigns a 1 second solvetime (negatives would need to be disallowed by code or consensus) everytime he finds a block then the next honest stamp helps him to lower difficulty. This may be a big enough problem to kill the idea.[my 2-block sample idea] It may cause emission rate to be significantly higher (maybe 50%) if 1/2 the miners do the 1 second stamp. They are faking an on-off attack to get the increased emission rate.

jacob-eliosoff commented 4 years ago

Fwiw: I tried comparing the different approximations to asert. The idea being, "For the 'tau=144*600' version of each algo, when the block time is n seconds, by what percent does target change?" So I suppose I'm making a mild case here for aserta - in particular, after a long block in the 10000-second-plus (3-hour-plus) range, aserta cuts difficulty (increases target) significantly more than aserti. But simplicity/legibility may trump these diffs.

asert_approxes

Here's aserta (reusing the Padé approximant from exp_int_approx()):

def next_bits_asert_abs_approx(msg, tau, bits_precision=32):
    """Uses the integer-math "Pade approximant" for e**x, from https://math.stackexchange.com/a/56064."""
    blocks_time = states[-1].timestamp - states[0].timestamp
    height_diff = states[-1].height - states[0].height
    orig_target = bits_to_target(states[-0].bits)
    x = ((blocks_time - IDEAL_BLOCK_TIME*(height_diff+1)) << bits_precision) // tau
    h = max(0, int.bit_length(x) - bits_precision + 3)
    term1, term2 = 3 << (bits_precision + h), 3 << (2 * (bits_precision + h))
    hth_square_root_of_e_x = (((x + term1)**2 + term2) << (2 * bits_precision)) // ((x - term1)**2 + term2)
    e_x = hth_square_root_of_e_x
    for i in range(h):
        e_x = e_x**2 >> (2 * bits_precision)
    e_x >>= bits_precision
    next_target = (orig_target * e_x) >> bits_precision
    return target_to_bits(next_target)
jtoomim commented 4 years ago

@jacob-eliosoff It looks like you're forgetting that aserti is 2^x based, not e^x based, so tau needs to be multiplied by ln(2) in order to be equivalent.

Once you do that, you'll see that aserti3 is basically the same as asert (within 0.01%), aserti2 is close (within 0.3%), and aserti1 is similar (within 6%).

jacob-eliosoff commented 4 years ago

Ah yes, right you are.

asert_approxes_v2

jtoomim commented 4 years ago

Also, you should be calculating error as (algo_diff - ref_diff) / ref_diff, not as (algo_diff_change - ref_diff_change). +231.952% vs +215.966% is a 5.049% error, not a 15.986% error.

aserti1 is not off by 504198% in the last row. It's only off by 4.78%.

aserti1 is not off by 0.000% in the first row. I'm not sure what it's off by, but it's more than that.

jacob-eliosoff commented 4 years ago

Well I originally did it that way but the first row looked pretty crazy. You can't win!

zawy12 commented 4 years ago

Mark mentioned he might like to see clipping on how much D can drop per block to prevent it from dropping too much too quickly. If I understand this concern correctly, not doing clipping could allow a small private mining attacker to do a spam attack on nodes by setting a timestamp far into the future then on the next block he would have low difficulty and can mine a lot of blocks keeping each solvetime = T. Then when that forwarded time plus T x number of blocks arrives, he submits the alternate chain that nodes need to check, which is a problem if they can't first determine the total chain work is less (the height will be lower). If it is an RTT the first block does not even need to be difficult. A limit on how much D can drop per block (like making it equal to the effect that a rare 10xT solvetime would have) makes it a lot more costly, especially for small HR. The drawback of clipping is avg st is longer if you have an on-off mining problem.

In short, if the nodes can reject a chain based on chain work before validating, it seems like clipping is not needed.

zawy12 commented 4 years ago

Before completely closing the door on other algos, remember I did not precisely determine the Std Devs, so the DA's are not known to be compared on an equivalent basis. But if there is any benefit, it should be small, like maybe at most getting 576 benefits with 288. Also, WT and LWMA math can be done to just look at first and last block without going through a loop.

jacob-eliosoff commented 4 years ago

Clipping (constraining a block's difficulty to be within eg 0.9x-1.1x the previous block's) seems like a huge hairy topic. Clearly the simplest thing is no clipping, which should probably be the default unless people show good arguments that it's necessary. Life also gets much simpler if we can assume the rules effectively prevent block timestamps being too far off. Can someone remind me of why it's impractical to make nodes & miners reject a block whose timestamp is either:

I can't see any reason either of these should be acceptable... A clock more than 30 seconds off on a miner/node in 2020 is inexcusable.

If we can enforce roughly-correct timestamps this way, clipping seems unnecessary? If we can't, whether or not to clip depends on balancing multiple risks:

Anyway I'm sure this has all been discussed elsewhere... I don't have much to add except "Perhaps if we can prevent bogus timestamps, clipping is unnecessary."

jtoomim commented 4 years ago

Then when that forwarded time plus T x number of blocks arrives, he submits the alternate chain that nodes need to check, which is a problem if they can't first determine the total chain work is less (the height will be lower)

Full nodes check all headers and verify chain work before doing any other verification. When the new blocks would trigger a reorg, they won't even download the full blocks until all of the headers and total work have been verified. This supposed attack is a non-issue.

I do not think it's worthwhile to look into clipping.

jacob-eliosoff commented 4 years ago

Forgive me if I'm misinformed but isn't there clipping now? I thought BTC had it... Are you saying you think it could be safely removed? Or that it's fine and we can leave it as is.

zawy12 commented 4 years ago

@jacob-eliosoff Mark said changing timestamp limits is more work because they are consensus rules. Jon said there can be problems in making some mining equipment and pool software no good.

Given Jon's response, I do not see a reason for clipping, with or without sequential timestamps. Clipping how fast difficulty falls (clipping long solvetimes) only allows an attacker to increase difficulty for longer, potentially increasing risk of driving HR away so that a selfish mine is easier.

@markblundeberg Can you say more on why you said clipping the rate at which difficulty falls might be good?

Yes, BCH has 2x and 1/2 limits on timespan compared to BTC's 4x and 1/4. A lower limit on the timespan (or equivalent limit on how much D can rise per block) allows a 51% attacker to get unlimited blocks in finite time. If the limits are symmetrical as in BTC/LTC/BCH/DASH etc then infinite blocks are possible in less than 3x the difficulty averaging window (if negative times affect the DA and are allowed back to the MTP). Even with sequential timestamps in the DA, a selfish miner can get more than the normal 100% of blocks (50% more in Zcash, 75% more in ETH).

I have not heard of any reason (other than Mark's and Jon's at the top) why timestamps can't be sequential (I disagree that negative stamps are needed to "get back to the right time" if the FTL is reasonably tight) or why the FTL can't be really tight like 1 minute. Peer time should be removed as it just gives an attacker Sybil or eclipse control of time.

I have not yet found an attack on ASERT due to allowing out of sequence timestamps except for the selfish miner technique of sending the public chain difficulty higher by setting the timestamp to the MTP+1 to drive some HR away, and then starting his selfish mine. For ASERT-288 if it happens during a longish 12*T time back to the MTP, he can send difficulty up by 5% in the next block, potentially triggering HR to leave for 1 block, making his selfish mine a little easier.

The rest of this is about why sequential timestamps are a universal distributed consensus requirement.

Almost every timestamp attack I cover in Timestamp Attacks issue 30 is from allowing out of sequence timestamps. It can also mess up scripts that use time. BIP 113 appears to be an example of previous a BIP not realizing the significance of the MTP in txs. The fundamental problem is that all distributed consensus mechanisms require a sequential clock ordering on all "messages" (things that are part of consensus). In the case of POW this mainly refers to block headers. It's a requirement (in an intuitive sense) because all consensus mechanisms require some type of voting (including lottery mechanisms that "vote" by random drawing) that must start and end in a (positive) time frame in which the "votes are counted" to determine & convey consensus. Block height order is not sufficient. POW was a tremendous discovery because it turned classic consensus upside down: instead of using a fixed time period to give nodes a time period in which to "vote", it used time to count the voting population. The chain with the largest total number of "votes" (hashes) wins. ( BTW the costliness of hashing that replaces the "costliness of acquiring & proving identity" also allows anyone to vote (hash) without registering their identity (true permissionless) which I think was unique to POW until VDFs. ) For a specific proof of the sequential requirement in all consensus mechanism's see Lamport's 1978 "Time, Clocks, and the Ordering of Events in a Distributed System" . See conditions IR2 and IR2' which require sequential stamps on messages. The paragraph before conclusion says clocks can't be set backwards for "our situation". In blockchain, this means if a node accepts any type of consensus-determining message (like a block with a timestamp) from another node that is ahead of his clock and that he is going to allow (due to FTL), then he must not send any consensus-determining message to other nodes with a time less than or equal to that time. In other words, he can't use his local time until it catches up with a future time that he has allowed. This will not get ahead of his local time plus a tight FTL unless there is a catastrophic problem such as >50% miners with the wrong time (that he will unilaterally refuse until that time comes) or a tremendous hashrate increase beyond the DA's ability to deal with it. Long network delays can cause problems that POW is capable of correcting. The FTL needs to be > 2x typical node error in time so that a synchrony requirement can be met. I've previously said it needs to be an additional > 2x propagation delay but right now I can't reason out why that might be true.

Instead of doing clocks correctly, BTC ended up with MTP of 11 to try to patch it. Patches have been tried to fix holes in the MTP patch. Some of them also have an exploit. This year Zcash did a security hot fix that appears to be because peer time was able to adjust clock time to before timestamps in blocks that the target node had already validated which locks the node. It appears the same code is in BTC, but P Wuille and G Maxwell indicated to me in an issue that they do not see a problem.

jacob-eliosoff commented 4 years ago

I realize it's an old hairy topic and I'm behind on it, but I would be interested in understanding a specific exploit that takes advantage of a rule like "Nodes/miners reject any block whose timestamp is either >1 min before its parent block's, or >1 min ahead of their system clock." There's probably a link about this online...

zawy12 commented 4 years ago

@jacob-eliosoff Now I understand your example better. There's no attack that I know of in that scheme.

I am assuming MTP of 11 is still in place (you have to have strict ordering at some point) and the 2x and 1/2 limits in BCH are removed. So what you've described would not be a clipping method. The DA has to be ASERT or otherwise handle the negative timestamps well.

jacob-eliosoff commented 4 years ago

Right, my hope was that if timestamp near-accuracy was enforced by a rule like I described (or similar), it would make clipping unnecessary. At least with an asert-like DAA.

OK, so BCH's clipping (max difficulty rise/drop) is currently 2x/0.5x (per block), as opposed to BTC's 4x/0.25x (per 2016 blocks)?

My ideal would be, aserti (or something similar) plus tighter timestamp rules, which I think would let us scrap the clipping. If the timestamp rules are hard to change, then it's unclear to me whether or not removing clipping is feasible.

jacob-eliosoff commented 4 years ago

My current guess is, even without changing timestamp rules, eliminating clipping would do no harm? Maybe?

I can imagine attack attempts like generating a chain of blocks timestamped a week in the future (which under asert-144 would reduce their difficulty 1000x), then releasing them in a week. But because their difficulty is so low, they wouldn't match the cumulative work of the real chain, so your low-difficulty chain just dies... I just don't see a way to make these manipulations pay off.

zawy12 commented 4 years ago

Right, my hope was that if timestamp near-accuracy was enforced by a rule like I described (or similar), it would make clipping unnecessary.

I do not think any clipping is needed. I only brought it up because Mark mentioned it.

OK, so BCH's clipping (max difficulty rise/drop) is currently 2x/0.5x (per block), as opposed to BTC's 4x/0.25x (per 2016 blocks)?

Yes. I don't think anyone has ever given a good reason for it. Dash (DGW algo) has 3x and 1/3. Zcash has 32% and 16% (it never hits those limits except to cause a 500-block delay at reaching correct D at genesis). I had a real reason to try a one-sided clip in LWMA which was to try to stop it from dropping so fast to put a damper on how fast on-off miners came back. But I could never see a benefit on live coins.

My current guess is, even without changing timestamp rules, eliminating clipping would do no harm?

No there should not be any harm. Keeping the clipping enables the unlimited blocks in <3x the difficulty window attack. Your 1 min limit on the negative time would stop the attack. Me, You, Greg Maxwell, and Bram Cohen discussed this as a backwards-compatible fix (no old blocks would have been changed due to the new rule) for BTC in 2018. No one complained on I think Bram's idea to just limit the reverse limit to 1 day.

But because their difficulty is so low, they wouldn't match the cumulative work of the real chain, so your low-difficulty chain just dies... I just don't see a way to make these manipulations pay off.

If you're hashing with >51% and don't let the difficulty go all the way to 1, you're going to get enough work by having a lot more blocks to counter the lower difficulty. Four LWMA coins were hit with it in April 2018 before everyone got it fixed. About 5,000 blocks were released in the attacks in a few hours. I did not look into the Verge attack close enough to confirm it was another.

If the timestamp rules are hard to change, then it's unclear to me whether or not removing clipping is feasible.

There should not be any problem to just removing the limits. I doubt they've ever been triggered.

jacob-eliosoff commented 4 years ago

Thanks @zawy12 for the clear answers!

It did occur to me that the 1-week-ahead attack would be helped by being able to generate blocks more frequently. I suppose a precise way to analyze it would be: under asert-144 with no clipping, what % of hashrate would you need now, to be able to build a chain of blocks timestamped a week from now, that beats (has more work) than the main chain a week from now? I'll poke at it.

jacob-eliosoff commented 4 years ago

Without getting into exact numbers, the first thing to note is that under asert, difficulty at a given time is simply determined by block height. So if you mine your own secret branch, it's impossible for your branch to be both at a greater height (ie, for you to have mined more blocks), and lower difficulty, than the main branch.

(This is assuming we're somehow enforcing the rule that "block timestamps can't be ahead of the time at which they're published," eg via timestamp rules like those I proposed. If miners/nodes accept blocks with timestamps arbitrarily in the future, you can indeed publish a slew of cheap (low-difficulty) future blocks, which is why they shouldn't.)

So, in the week-ahead attack, if you mine 7 * 144 = 1008 blocks timestamped a week ahead, they will indeed be much lower-difficulty on average than the main chain's parallel 1008 blocks. But, they'll add up to correspondingly much less work. And - crucially - your secret chain's 1009th block will be the same difficulty as the main chain's 1009th block! So (again, without having worked out the exact math), it seems to me that a successful 51% attack (matching the main chain's accumulated work) will get very expensive.

In short: releasing a secret chain at time t with more accumulated work than the main chain, requires mining at least some blocks at higher difficulty than the main chain. There are no "shortcuts" where you outwork the main chain by mining a zillion low-difficulty blocks.

zawy12 commented 4 years ago

I don't think any of the DA's allow an advantage to a selfish miner via his ability to manipulate timestamps unless there is clipping on how fast the difficulty can rise. Chia's white paper tried to show an advantage in BTC's algo, but I argued against Bram in Twitter how their example showed the attacker still needed a greater HR than the benefit.

The only time I know of that chain work does not work correctly is something I mentioned in the video: if there is a small number of blocks. Chain work leaves off a necessary (N-1)/N which is why Emin et al's 2nd paper on selfish mining is able to win with < 50% of the HR even if the attacker can't relay his blocks faster (Jon mentioned spreading blocks faster in the video as part of the attack, but that only applies in the 1st paper).

But let's say there are several low-difficulty blocks (large N, so the correct above does not apply) but only 1 final block with high difficulty to bring the attacker back to slightly greater sum of difficulties. Since fast solvetimes are more likely, he should win on average with less than 51% HR.

Here's an attack example. I'm use ASERT with N=1 to make it easier to do the math and manipulate difficulty more.

Main chain work, constant HR (M=number of blocks): D0 N = D0 { k + k^2 + ... k^M } where k^i = 1 = [ e^(1-1 T/T) ]^i So main chain total time is just T M.

Attacker does the following 8 st's as a multiple of T: -0.1, 1, 7.1, 1, 1, 1, 1, 1. chain work is D0 +
D0 e^(1 - ( - 0.1 T)/T) + D0 e^(1 - ( - 0.1 T)/T) e^(1-1 T/T) + D0 e^(1 - ( - 0.1 T)/T) e^(1-1 T/T) e^(1- 7.1 T/T) + D0 e^(1 - ( - 0.1 T)/T) e^(1-1 T/T) e^(1- 7.1 T/T) 5 = 7.04 D0

D0 7.04 is the sum of D's chain work. The HR required to find that work on average is less because avg(1/avg(M st's)) for a small M samples is greater than than the long-term HR which is avg(1/avg(large sample of st's)). Fast solvetimes are more likely, so smaller HR appears to have more chain work for small samples. For the 1/2 quantity below, he's solving the initial difficulty, but he's only going to start an attack if he solves it in less than T/2. The HR he needs to get the above chain work is then (this is a back-of-envelope calculation) D0/T ( 1/2 + 2 3 (2-1)/2 + 6 1/148 ) = 3.53 D0/T

Attacker's last timestamp is 12 T (allowed by FTL and MTP) He submits it when public chain has 7 more blocks and time is 7 T with 7 D0 chain work. It actually takes the main chain only 7 T (7-1)/7 on average. So he gets 9 blocks in 6.5 T with 3.53/(76/7+3.53) = 37% HR, a benefit of 1/0.37 9/7 * (7-1)/7 = 2.9x revenue. I think this is on average. Sometimes the attack fails. The exploit is not a quirk of ASERT (other than making it so fast it's over & underestimating HR), but a combination of not following the 1978 requirements for distributed consensus and not calculating chain work correctly.

The negative solvetime is not necessary, but the profit is less.

The attack requires a lot less HR (maybe only 15%) if only 1 block is at the really high difficulty (instead of a (N-1)/N reduction in HR for the 2 blocks it's about 1/7 for 1 block).

To clarify, the number of hashes needed to find N blocks if HR is constant and D is changing is:

Hashes = total_time 2^256 / average(target solvetime) (N-1)/N which is Hashes = total_time harmonic_mean(difficulty/solvetime) * (N-1)/N

I don't think hardly any advantage can be gained if the ASERT window (like 288) is large compared to the FTL and in that case will also need to allow negative timestamps.

jtoomim commented 4 years ago

https://gitlab.com/jtoomim/bitcoin-cash-node/-/commit/fd92035c2e8d16360fb3e314b626bf52f2a2be67

jtoomim commented 4 years ago

what % of hashrate would you need now, to be able to build a chain of blocks timestamped a week from now, that beats (has more work) than the main chain a week from now?

You don't need to do any fancy math for this. The answer is 101%. It does not depend on the DAA in any way. The specifics of the DAA can change the number of blocks you mine, and the difficulty of those blocks, but the work performed depends only on your hashrate and luck.

The answer is 101%, not 51%, in a scenario where multiple coins share the same algorithm (e.g. BTC + BCH) because if you remove your 51%, the difficulty will drop and more hashrate will come to BCH from BTC, and you will be racing your 51% against their 100%. So you need 101%.