n-y-z-o / nyzoVerifier

Verifier for the Nyzo cryptocurrency
The Unlicense
73 stars 42 forks source link

Stuck verifier due bad block stop 3 different sentinels and drop out just now #45

Open reb0rn21 opened 2 years ago

reb0rn21 commented 2 years ago

holo6 just dropped out: *** left at block 16025645, cycle length: 2574 verifier version 607 VPS had no issue CPU load was <30% at block i_016023877 was corrupt verifier got stuck and all sentinels 3 of them that tracked him (each sentinel had 10-20 verifiers in total) got stuck at same block, the holo6 then dropped out some 3-4h later as none of 3 sentinel protected it

holo6 score 6 hours after the drop (score was perfect before it drop a bit only as it was stuck for ~4h before the drop): holo6 -137532 360 -137574 180 -137520 360 -137544 360 -137874 360 -148116 0 -148106 0 -148116 0

I will post logs of sentinel, holo6 log and will post block that do not match hash of the others, I do not want to presume anything how can perfectly fine verifier that had none issue on past 1year just get bad block all of sudden on top of it how that bad block can kill three sentinels that tracked holo6 sentinel where version: 610, 612 and 3th I updated so I am not sure which version it was but it was 608+ maybe even 612+

All 3 sentinel had to be cleaned of all block and resynced to get working again

holo6 there is log + block that was malformed, only sentinel2 have the log when it got stuck rest I was late to save logs

nyzo.zip

n-y-z-o commented 2 years ago

We will look into this issue. If the block signature was good, this is likely an issue with the "wrong" block being frozen by the verifier, which would indicate a failure of the consensus mechanism. We haven't seen any consensus issues since the transaction limit was added to block scoring in v560, but it's certainly possible that this represents a consensus issue that hasn't been seen before.

Thank you for the block and the logs. They will be helpful.

reb0rn21 commented 2 years ago

But why 3 different sentinel that track 10 to 20 verifiers got stuck from that verifier and open way to drop most of my verifier if I had not checked + holo6 that was dropped out after ~4-5h (the issue or attack happened when I mostly sleep)

n-y-z-o commented 2 years ago

The two ways that a verifier can be legitimately dropped from the cycle are:

While most of the removals lately have been due to the second case, the first case was also a common cause for removals, especially when we had consensus issues. These removals will happen almost immediately, because the only other choice is for the blockchain to stall. This case would happen if both the verifier and all sentinels protecting the verifier got stuck on the one incorrect block.

There are a few interesting potential solutions to this problem, but the most sensible would be to add a concept of a "semi-frozen edge" or "pseudo-frozen edge" to the sentinel. In this case, the verifier would still get stuck. For reasons that are a little complicated, we actually want the verifier to get stuck in this case, because allowing a verifier to get itself unstuck automatically would introduce some difficult problems. But if the sentinel is able to cover for the verifier, that would buy enough time to manually fix the verifier.

This also highlights further that an alternate entrance process, where the manual override is actually used, would strengthen the system. In this case, if we see that holo6 was removed due to an issue with the code, the appropriate action would be to promote holo6 to the top of the entrance queue and allow it to rejoin. This would also reduce concerns of malicious removals, as the cycle allowing a verifier back in would suggest that the cycle did not have any malicious intent against the verifier.

awshrb commented 2 years ago

I also encountered this problem.The incycle verifier got stuck,the latest block can be obtained by the sentinel but it can't update frozen edge for ManagedVerifier. verifier's log: ^^^^^^^^^^^^^^^^^^^^^ casting vote for height 16044717 broadcasting message: BlockVote19 to 2602 trying to fetch BlockRequest11 from bullish18133 trying to fetch MissingBlockVoteRequest23 from Lelek trying to fetch MissingBlockVoteRequest23 from D_e_L_i22 trying to fetch MissingBlockVoteRequest23 from KatBusch24 trying to fetch MissingBlockVoteRequest23 from bevan2 trying to fetch MissingBlockVoteRequest23 from ajmo2-1. trying to fetch MissingBlockVoteRequest23 from Krooa15384 trying to fetch MissingBlockVoteRequest23 from Z_Pro29 trying to fetch MissingBlockVoteRequest23 from Npass5814 trying to fetch MissingBlockVoteRequest23 from imdb007 trying to fetch MissingBlockVoteRequest23 from Supercoin4 requesting block with votes for height 16044717, interval=4417 trying to fetch BlockWithVotesRequest37 from PRADA7 waiting for message queue to clear from thread [Verifier-mainLoop], size is 2 trying to fetch BlockRequest11 from elasticsearch069 added new out-of-cycle node to queue: b49b...2be9 trying to fetch BlockRequest11 from Sofico trying to fetch BlockRequest11 from EasyD3 ^^^^^^^^^^^^^^^^^^^^^ casting vote for height 16044717 broadcasting message: BlockVote19 to 2602 trying to fetch BlockRequest11 from Artboy11 top verifier 5046...8473 has 1852 votes with a cycle length of 2570 (72.1%) trying to fetch MissingBlockVoteRequest23 from NV1070 trying to fetch MissingBlockVoteRequest23 from imdb026 trying to fetch MissingBlockVoteRequest23 from Getting_12 trying to fetch MissingBlockVoteRequest23 from Pactine6 trying to fetch MissingBlockVoteRequest23 from BeatyMold6 trying to fetch MissingBlockVoteRequest23 from MiniStand4 trying to fetch MissingBlockVoteRequest23 from Tabulation trying to fetch MissingBlockVoteRequest23 from tomtom07-1136 trying to fetch MissingBlockVoteRequest23 from Glenn Ford trying to fetch MissingBlockVoteRequest23 from Jack-LA14 requesting block with votes for height 16044717, interval=5302 trying to fetch BlockWithVotesRequest37 from CPRCP7 waiting for message queue to clear from thread [Verifier-mainLoop], size is 1 trying to fetch BlockRequest11 from Cheyanne trying to fetch BlockRequest11 from Jabba176 trying to fetch BlockRequest11 from Alita Angel ^^^^^^^^^^^^^^^^^^^^^ casting vote for height 16044717 broadcasting message: BlockVote19 to 2602 trying to fetch BlockRequest11 from 4440..cd6f trying to fetch MissingBlockVoteRequest23 from EXPRESS28 trying to fetch MissingBlockVoteRequest23 from High 11 Build trying to fetch MissingBlockVoteRequest23 from tomtom04-1067 trying to fetch MissingBlockVoteRequest23 from Miss You7 trying to fetch MissingBlockVoteRequest23 from Gift71 trying to fetch MissingBlockVoteRequest23 from Indue2 trying to fetch MissingBlockVoteRequest23 from pvip229 trying to fetch MissingBlockVoteRequest23 from jing trying to fetch MissingBlockVoteRequest23 from Correction trying to fetch MissingBlockVoteRequest23 from USA.10.An.network requesting block with votes for height 16044717, interval=5458

sentinel's log: updated frozen edge for ManagedVerifier myverifier: 16044716 updating frozen edge for ManagedVerifier myverifier updated frozen edge for ManagedVerifier myverifier: 16044716 maximum cycle transaction amount=∩100000.000000, balance=∩68641863.722158 after registering block [Block: v=2, height=16046090, hash=, id=3f48...7773] in BlockchainMetricsManager(), cycleTransactionSum=∩0.000072, cycleLength=2571 froze block [Block: v=2, height=16046090, hash=d274...9c82, id=3f48...7773], efficiency: 95.9% updating frozen edge for ManagedVerifier myverifier updated frozen edge for ManagedVerifier myverifier: 16044716 updating frozen edge for ManagedVerifier myverifier updated frozen edge for ManagedVerifier myverifier: 16044716 maximum cycle transaction amount=∩100000.000000, balance=∩68641863.722158 updating frozen edge for ManagedVerifier myverifier after registering block [Block: v=2, height=16046091, hash=08a2...7f61, id=bd10...4892] in BlockchainMetricsManager(), cycleTransactionSum=∩0.000072, cycleLength=2571 froze block [Block: v=2, height=16046091, hash=08a2...7f61, id=bd10...4892], efficiency: 95.9% updated frozen edge for ManagedVerifier myverifier: 16044716 updating frozen edge for ManagedVerifier myverifier updated frozen edge for ManagedVerifier myverifier: 16044716 updating frozen edge for ManagedVerifier myverifier updated frozen edge for ManagedVerifier myverifier: 16044716 maximum cycle transaction amount=∩100000.000000, balance=∩68641863.722158 after registering block [Block: v=2, height=16046092, hash=8871...5a3c, id=9448...d6a6] in BlockchainMetricsManager(), cycleTransactionSum=∩0.000072, cycleLength=2571 froze block [Block: v=2, height=16046092, hash=8871...5a3c, id=9448...d6a6], efficiency: 95.9% updating frozen edge for ManagedVerifier myverifier updated frozen edge for ManagedVerifier myverifier: 16044716

n-y-z-o commented 2 years ago

@awshrb This appears to be a different issue. In your case, the sentinel is continuing to track the blockchain and protect verifiers. In the sentinel log, the latest frozen edge is 16,046,092, which is 1,376 blocks past the frozen edge of the stuck verifier.

The log statements you are seeing are monitoring that was added in version 612.

updating frozen edge for ManagedVerifier myverifier
updated frozen edge for ManagedVerifier myverifier: 16044716

This is a special check that is done to see if a verifier is truly behind or whether the sentinel has just gotten unlucky in its recent block requests to that verifier. This happens in the Sentinel.updateFrozenEdge() method.

n-y-z-o commented 2 years ago

@reb0rn21, this is a really interesting case. Your verifier did actually freeze a different block than the rest of the cycle, which would be a failure of the consensus process. We saw a lot of this back when the blockchain stalled all the time, but we hadn't seen it happen in the past few years.

The block that your verifier froze had hash 3db0...fc01, while the block the rest of the blockchain froze had hash 20e1...5526. Interestingly, both of the blocks had exactly the same timestamp, 1648963947575.

The blocks differed in that one included a meta-nickname transaction with the value charming.

If we look now at the identifier that produced the block (https://nyzo.co/status?id=bf9e.91c1), we can see two verifiers running on that identifier. The system was not actually designed for this, and we were planning to add some code to handle multiple verifiers running at the same time. Running of multiple verifiers on the same identifier always presents a challenge to the consensus, and it actually looks and behaves exactly like a brute-force attack on consensus would look and behave.

The first of the two verifiers is charming at 51.15.83.159, running 610001. The second is Charming at 51.15.212.126, running 611002. So, the block that holo6 froze came from charming, and the block that most of the cycle froze from came from Charming.

We'll add monitoring to see if other cases of this happen. We did not have the consensus tracking introduced in v543 (https://tech.nyzo.co/releaseNotes/v543) enabled on any of our verifiers for this block. We'd need that level of monitoring to have information that would be useful for tuning the consensus algorithm directly.

However, we can still begin to address this by adding the pseudo-frozen edge to the sentinel. We'll get that into our planning for development. If/when we collect more information about a similar issue in the future, we will look at how that information might be applied to a tuning of the consensus algorithm.

As I mentioned, this further highlights the need for an alternative entrance process. Issues like this, where a verifier is dropped due to a core issue in the code, should allow for an easy manual override to let the verifier back in. There is a manual override now, but it isn't easy enough to be used frequently.

A proper alternative entrance process would also address the join/drop situations. It appears that many users are waiting months to get in, finally join the cycle, and then drop after a cycle or two. 517d...85b3 and Rare9 appear to be good recent examples of this.

n-y-z-o commented 2 years ago

@reb0rn21 To ensure this issue doesn't happen again without being noticed, we've added a check for hash on the nyzo.co web server. We also made a small modification to Argo 746 to report an incorrect hash for the first 10 blocks of every 100. This is what it looks like when a verifier reports an incorrect hash.

incorrect_hash_argo_746

Currently, this is only displaying for verifiers in the cycle. However, it has detected a few verifiers that got stuck on block 3835422, which was a blockchain stall that we had back in July of 2019:

These out-of-cycle incorrect blocks are not of consequence; they are just interesting.