zawy12 / difficulty-algorithms

See the Issues for difficulty algorithms
MIT License
107 stars 25 forks source link

Handling Bad Timestamps #13

Open zawy12 opened 6 years ago

zawy12 commented 6 years ago

Upper Timestamp Limit

There is not a real-time clock in cryptocoins that has an accurate time except for the median time of node peers (or of just single peers). This puts an upper limit on what time a miner can assign to a block. This upper limit is crucial. If it were not there, miners could assign a large future time which would drive difficulty down as far as they want in only 1 block.

Allowing miners to assign the timestamp instead of simply using the median of node peers is crucial to a cryptocoin. At the core of why POW is used, miners perform a timestamp service as a for-profit competitive group that prevents the need for a 3rd party centralized timestamp service. It can't be accomplished by just nodes in an asynchronous network, especially if many nodes can be created to perform a Sybil attack.

Best timestamp handling method

The best way to handle bad timestamps is to make sure the future time limit allowed by nodes is restricted to 6xT or less. Default is on many coins and it is 7200 seconds.

[ Update: to minimize profit from bad timestamps, I am recommending FTL be set to 300 or TxN/20, whichever is larger. ]

A bad timestamp can lower difficulty by N/(N+7200/T) when the future_time_limit is the usual 7200 seconds (Bitcoin and Monero clones). The next honest timestamp will reverse the effect. To maximize the number of blocks a big miner can get without difficulty rising, he assigns to all his blocks timestamp = (HR+P)/P*T + previous_timestamp where P is his hashrate multiple of the network's hashrate HR before he came online. For example, if he has 2x network hashrate, he can assign 1.333xT plus previous timestamp for all his timestamps. This prevents difficulty from rising, keeping it the same value, maximizing the number of blocks he can get. With CRYPTONOTE_BLOCK_FUTURE_TIME_LIMIT=7200 and T=120 seconds, this would allow him to get 7200/120 = 60 blocks without the difficulty rising. He can't continue because he will have reached the future time limit that the nodes enforce. He then leaves and difficulty starts rising (if negative solvetimes are allowed in the difficulty calculation). If negative solvetimes are not allowed, he gets 60 blocks all the same over the course of 90 blocks that he and the rest of the network obtained. The average real solvetime for the 90 would be 1/3 of T (if he has 2x the network hashrate) without the difficulty rising. And when he leaves, difficulty will stay the same. So the algorithm will have issued blocks too quickly without penalizing other miners (except for coin inflation which also penalizes hodlers).

A miner might be able to own the local node peers and have control of the time limit, but I believe this risks splitting the chain or causing other nodes to reject the blocks.

[update: Graft had big problem from a big miner "owning the MTP". This was made easy by Graft not lowering the FTL from 7200 to 500. Here's a full discussion. ]

Lower Timestamp Limit

The lower limit is approximately 6xT into the past (in Bitcoin and maybe vast majority of all others). It's not based on real time, but on the median of the past 11 block timestamps, which is the 6th block in the past if the 11 blocks had sequential timestamps. This is called median time past (MTP).

The lower limit can get "owned" by a >50% miner who would generally get 6 of the 11 past blocks. He could keep assigning the same time initial_timestamp - 6xT to all his timestamps. By his 7th block it could be -12xT compared to real time because he owns the MTP. Six blocks later he can assign -18xT, and so on. This drives difficulty up (if negative solvetimes are allowed in the difficulty calculation), so its a malicious attack that does not help him.

If an algo has a limit on the rate of difficulty falling that is less strict than its limit on rising (asymmetrical limits) and if it is looking at individual timestamps and allowing negative solvetimes, then he can ironically drive difficulty down by constantly assigning reverse timestamps (it's ironic because a reverse timestamp should send difficulty up). The extreme asymmetry is not allowing negative solvetimes by using if (solvetime<1) { solvetime=1;}. A bad actor with >50% hash rate and constantly assigning negative timestamps in this case would cause difficulty to drop to zero.

Other methods of timestamps handling:

Method 1: Do nothing

[this was written before I realized lowering FTL would solve all timestamps problems]

The +12xT and -6xT limits on timestamps in Bitcoin are pertty good if T=600. In a simple moving average (SMA), an incorrect time that is 12xT into the future would make the difficulty drop for only 1 block by about N/(12+N) which is 17% for a typical N=60. The next block after it would immediately correct the error with a timestamp that is about -11xT his timestamp which should still be a full 6xT ahead of the MTP. Conversely, bad -6xT timestamps would typically raise difficulty less by N/(N-6) and be immediately corrected on the next honest timestamp.

Doing nothing could cause a serious problem for the WHM-like algorithms that give a higher weight to more recent timestamps. If the two most recent blocks are assigned the -6xT, there are many scenarios where difficulty would come out negative, causing a catastrophic problem. It becomes a lot harder to do as N gets higher.

The EMA difficulty would shoot up exponentially in a single block with a single negative solvetime that results from an out-of-sequence timestamp.

Even the SMA could have a problem if N is really low like N=10. For example, if 6 of the past 11 solvetimes were 1xT to allow a -6xT and then if 3 of the 10 in the window were 0.1xT then the denominator will be 5xT+4x0.1xT-1x6xT = -0.6xT which gives a negative difficulty.

But the worst that can happen in an SMA is if a dishonest 12xT forward stamp is at the end of the window when a -6xT comes in the front, then the difficulty is incorrectly lowered by N/(N+18) and the vice versa case is a rise in difficulty of N/(N-18).

An attacker could also cause a divide by zero failure, or give a really small value for the denominator like 0.001, sending the difficulty up 1000x, effectively forcing a fork.

Method 2: Set negative solvetimes to 0

In DAs that loop over each block in the N window to get all solvetimes rather than subtracting oldest from the most recent timestamp, simply blocking a negative solvetime as in the following code has an ironic catastrophic failure:

if ST < 0 then ST=0
# or this would be  about the same
# if ST < 1 then ST=1

The failure occurs if a largish ~20% miner keeps assigning the -6xT limit. The irony is that you expect -6xT to drive D up, but since it is converted to 0 second, the next honest timestamp that is solved in a normal 1xT will cause a very large "calculated" solvetime of 7xT for the next block after the biggish miner assigns -6xT. Consider the following timestamps and calculated solvetimes from the loop in an SMA that uses ST= timestamp[i] - timestamp[i-1] when a 20% miner (20% of blocks) assigns -6xT:

apparent solvetimes =  1, 1, 1, -6, 7, 1, 1, 1, -6, 7, 1, 1, 1, -6, 7  (average is 1, so it's OK)
after blocking negatives = 1, 1, 1, 0, 7, 1, 1, 1, 0, 7, .....

When blocking negatives the average ST is 2, so a 20% miner can cut difficulty in half in just 1 window length. When the difficulty drops, more mining will come on and the 20% miner's effect will be reduced. But if he were able to maintain 20%,, difficulty would drop to zero in a few N windows (replace the 1's in the last 2 sequences above with 0 to represent everyone solving in 0 seconds, and the difficulty will still continue to drop).

The EMAs and WHM algorithm would be affected about the same.

For an SMA algo or Digishield v3 who simply subtract the oldest from the newest timestamp to get sum(STs), and is not using MTP as the newest block it is not too bad and a little better than doing nothing. The difficulty could be incorrectly high or low from bad timestamps by N/(N +/- 13) in the case where -6xT is the oldest block in the window and -6xT is also the 2nd to most recent block, causing the most recent block to be +7xT.

Method 3: Set solvetime to difference between highest 2 timestamps

This method is if you are not going to allow a signed integer for solvetime (otherwise use method 6). It's good for all algorithms. Method 6 is more accurate and retains symmetry and is preferred.

Superficially it seems like method 2 because the bad timestamp will be reported with a solvetime of zero, but it does not have the catastrophic exploit.

In code it can be done like this:

# Idea from Neil (kyuupichan)
# in case several negatives were in a row before our N window:
# numbering is "blocks into past", -1 = most recently solved block
# The next line could have been just -N-1, but checking more reduces potential small error.
prev_max = max(timestamp[-N-1],timestamp[-N-2],timestamp[-N-3],timestamp[-N-4])
for i=-N to -1 
   max_timestamp = max(timestamp[i], prev_max)
   solvetime[i] = max_timestamp - prev_max
   prev_max = max_timestamp
.....

For N in a low range like N=60 combined with a coin that allows timestamps up to 24xT ahead of median of peer node time, the above allows an exploit for getting 15 blocks at ~35% below average difficulty (see further below). The following is a fix for WHM (and WT-144) that retains the above idea, but greatly blocks the exploit without slowing the ability of the difficulty to drop with normal speed if hashrate goes to, say, 1/3.

for i=-N to -1 
   prev_max_timestamp = max(T[i-1],T[i-2],T[i-3],T[i-4],T[i-5],T[i-6],T[i-7] )
   solvetime[i] = T[i] - prev_max_timestamp
   if solvetime[i] > 7*T then solvetime[i] = 7*T
...

In EMA's we do it similarly:

L = 7 # limits forwarded stamps
maxT=timestamp[height-7-1]
for ( i = height - 7 to i < height )  { maxT = max(maxT, timestamp[i]) }
ST = timestamp[height] - maxT 
ST = max(T/200,min(T*7, ST))

The 7 is chosen based on being a fairly rare event, but also fairly tight choice so that N for the EMA-JE can be N=40 without causing a big problem.

Describing the problem if limit=7 is not used: imagine solvetimes are all magically 1xT in order to simplify the math. Now say a 20% miner comes on and always assigns the max allowed timestamp:

Timestamps: 1,2,3,4,(5+12),6,7,8,9,(10+12)... Solvetime without using the "7"

1,1,1,13,0,0,0,0,5,0,0,0,0,5,..repeat

Ignoring the 4 startup blocks, avg ST is 1 like it should be. The SMA (with a loop), WHM, and EMA-JE with N=60 (which is like EMA-Z with N=30) all end up at about 22% below the correct difficulty for as long as the miner does this. The 13 triggers a 30% drop in EMA and WHM (18% in SMA) and the 5's slow its recovery. The 5's are actually 4 in the steady state because the difficulty dropped. They are even lower because low difficulty attracts more hash power, so steady state is not as bad as I've said. My point is that there is a recovery in 30 blocks, but there are 15 blocks obtained at an average of 25% too low in WHM and EMA and 15% in SMA.

Monero clones may allow +24xT timestamps and with WHM N=60, this is a disaster. My testing indicates about 25 blocks at 35% below correct difficulty in WHM and EMA-JE with N=60 (EMA-Z N=30). Some clones of other coins may keep a rigid 2 hour max while reducing the T, so that a T=120 seconds coins with a 2-hour max ahead of the median of node peer time would allow a +60xT.

Here's the scenario with the limit=7 protection, assuming the worse situation where +24xT is allowed: timestamps: 1,2,3,4,(5+24),6,7,8,9,(10+24),11,12,13,(14+24).. solvetimes with "7" protection: 1,1,1,8,0,0,0,0,5,0,0,0,0,5... For WHM and EMA, there are 15 blocks obtained at only 15% below the correct difficulty. So it's better under worse circumstances.

The way the 7 is used above "retains symmetry". It limits the jump which makes a smaller N more palatable in the presence of timestamp manipulation. "Retaining symmetry" means it will come back down to the correct value. Making the 7 smaller in the loop than in the max limitation enforced at the end would have caused the difficulty to be below the correct value for a few blocks.

In the algos I've refined the above idea to make the "7" range from 5 to 9. Really, using 5 is problematic it's need to prevent a 20% in an N=40 algorithm every time there's a bad timestamp. So Method 6 is really preferred.

An SMA that simply subtracts first and last timestamps in the window and nothing else would again have much less error in D from bad timestamps, limited to even less than method 2. In this case: N/(N-1) is the max of raising difficulty to N/(N+13) = 18% drop for N=60 for one block (it then immediately corrects when an accurate timestamp comes in). Then when that block exists the window, D rises by 18% for 1 block. so it seems like doing nothing works really well, but method 4 is a way you can prevent the single-bad-timestamp cases from having an effect if you don't mind delaying your response by 1 block: See method 4.

There is another way to stop the single case in SMA's (that subtract first from last timestamps). They could use something like the limit=7 idea and apply it to both the beginning and end of the window to protect as the bad time comes into and out of the window. EMA and WHM don't give hardly any weight when it leaves their "window", so it's not needed there. Here it is for just 4 blocks.

# numbering is "blocks into past"
begin_timestamp =max(timestamp[-N-1], timestamp[-N-2], timestamp[-N-3], timestamp[-N-4] )
end_timestamp = max(timestamp[-1], timestamp[-2], timestamp[N-3], timestamp[N-4] )
total_solvetime = end_timestamp - begin_timestamp

Method 4: Median of 3 to stop most bad Timestamps

I learned this from BCH's new DAA. For SMAs that subtract first and last timestamps, do this:

begin_timestamp =median(timestamp[-N-1], timestamp[-N-2], timestamp[-N-3] )
end_timestamp = median(timestamp[-1], timestamp[-2], timestamp[N-3] )

I can't logically justify this because it only protects against single bad timestamps when they would not cause much problem anyway and delays response by 1 block. But I love it for some reason, and it inspired the last part of method 3.

Method 5: Limit change in Difficulty to indirectly limit timestamps

You can put symmetrical limits on difficulty changes from the previous block. There's no reason to do this, but I wanted to mention it. I think it's not hardly different than method 6. This is what MANY coins should have used instead of copying Digishield's POW limits that are useless and cause harm if you try to make Digishiled respond faster. This allows the algorithm to change as fast as it can in the presence of a specified-size hash attack starting or stopping, but not faster.

D = calculated D for next block to be considered for limiting
X = 10 # max expected size of "hash attacks" as multiple of bash hash rate
limit = X^(3/N) # 3 for WHM and EMA.  Use 2 for SMA
if ( D > prev_D*limit) { D=prev_D*limit }
if ( D > prev_D/limit) { D=prev_D/limit }

Or it can be made to act as if method 6 timestamp limits are in place.

limit = 1+max_allowed_solvetime/N
if ( D > prev_D*limit) { D=prev_D*limit }
if ( D > prev_D/limit) { D=prev_D/limit }

Method 6: Limit timestamps from previous timestamp

This is the best method if the future time limit determined by median of nodes is not reduced as described at very top.

When a bad timestamp comes through, it can be allowed it to "do its damage" with almost no problem as long as negative solvetimes are allowed. If a forward time is used to try to lower difficulty, the next accurate timestamp that comes through will erase its effect completely in all algorithms. And vice versa if a bad reverse timestamp comes in. Monero clones may allow a 24xT forward time to come in and if the algo uses N=24, then difficulty can be cut to 1/4 in the next blockwhen using the low N's I recommend for T=600 second coins. It can even throw a negative in EMA-Z A limit of 8xT as max solvetime after previous timestamp did not hurt performance much (see chart below). If the limits are not symmetrical, an exploit is possible. If you're using an SMA or Digi v3 for some reason, keep in mind the effects are reversed when the bad timestamp leaves out the back of the window.

    if ST > 6*T then ST=6*T
    if ST < -5*T then ST= -5*T

Limiting the solvetime makes D slower to drop if there is a big decrease in hashrate. The plots below show what happens without and then with the 8xT limit when hashrate drops to 1/50th with the EMA-Z. This shows performance is not hurt much.

_ts_limit1

This shows the benefit in a live coin of allowing the negative solvetimes. Graft was using method 3, which is the second best method.

image

Perfect clock = mining not needed

If there was good real-time clock such as all nodes keeping a good universal time, we could theoretically make the network synchronous like a computer relying on a clock pulse and do away with consensus. Synchronous networks do not have FLP or Byzantine problems because there is no need for consensus. In short, every node would have a good universal clock accurate to within a few seconds, and there would be an agreed-upon clock cycle with an "on time" when new transactions are accepted by nodes, and an "off time" to wait for all nodes to get and validate all transactions. It takes about a minute for 0.5 MB blocks to propagate to 90% of nodes, so the off time might need to be 5 minutes. All nodes would accept all valid transactions within the window, but they have to each independently determine what the acceptable window is by their independent but accurate clock (independent so they all do not all use the same source for time which would be an attack-able 3rd party). Then transactions would be sorted into a block and hashed like a regular block chain. Nodes do not theoretically need to ask other nodes anything. They just share transactions. There are hairy details (such as straggling transactions on the edges of the "on window" and re-syncing to get the right block chain when rejoining the network) that would require some asking of other nodes which is the beginning of trusted peers or consensus that only POW has solved. But CPUs work without consensus and with "large" (nanosecond) delays. To what extent are nodes not like a disparate groups of transistors on a CPU waiting on electrons (transactions) to arrive? Nodes going online and dropping out is like only some of the transistors turning off and on, so the analogy even works to show the problems.