hyperledger-labs / Scorex

Scorex 2.0 Core
Apache License 2.0
544 stars 115 forks source link

Node crash due to failed rollback in Hybrid #191

Open dkaidalov opened 6 years ago

dkaidalov commented 6 years ago

There is a problem happening all the time to me if I run 2 or more nodes. A node may crash with the following exception:

java.util.NoSuchElementException: versionID not found, can not rollback
    at io.iohk.iodb.LSMStore.notFound$1(LSMStore.scala:868)
    at io.iohk.iodb.LSMStore.$anonfun$rollback$2(LSMStore.scala:875)
    at scala.Option.getOrElse(Option.scala:121)
    at io.iohk.iodb.LSMStore.rollback(LSMStore.scala:875)
    at examples.hybrid.state.HBoxStoredState.$anonfun$rollbackTo$1(HBoxStoredState.scala:88)
    at scala.util.Try$.apply(Try.scala:209)
    at examples.hybrid.state.HBoxStoredState.rollbackTo(HBoxStoredState.scala:84)
    at scorex.core.NodeViewHolder.$anonfun$updateState$1(NodeViewHolder.scala:230)
    at scala.util.Try$.apply(Try.scala:209)
    at scorex.core.NodeViewHolder.updateState(NodeViewHolder.scala:226)
    at scorex.core.NodeViewHolder.updateState$(NodeViewHolder.scala:220)
    at examples.hybrid.HybridNodeViewHolder.updateState(HybridNodeViewHolder.scala:26)
    at examples.hybrid.HybridNodeViewHolder.pmodModify(HybridNodeViewHolder.scala:123)
    at examples.hybrid.HybridNodeViewHolder.pmodModify(HybridNodeViewHolder.scala:26)
    at scorex.core.NodeViewHolder$$anonfun$processLocallyGeneratedModifiers$1.applyOrElse(NodeViewHolder.scala:373)
    ......

So actually at some moment a node isn't able to do a rollback. A brief look at the issue brought me to this function source code link which, as I can see, isn't fully implemented. It is probably the cause of the crashes (not fully sure though).

Are you also having this problem? Are you going to fix it anytime soon?

ceilican commented 6 years ago

@dkaidalov , I have recently executed a few nodes (see #152), and I didn't experience this behaviour. Could you tell us in more detail what you did to see this error?

terjokhin commented 6 years ago

This error related to HBoxState, not history. @dkaidalov could you provide more details? I'm trying to reproduce.

dkaidalov commented 6 years ago

Indeed, the error is raised in HBoxStoredState, but its reason coming from the non-properly implemented HybridHistory::bestForkChanges

The main problem is that HybridHistory::bestForkChanges returns a ProgressInfo structure with toApply field containing only the head block instead of the whole applyBlocks array. This, in turn, leads to the situation that not all necessary blocks are applied to HBoxStoredState (because it uses ProgressInfo structure to update its state) and then, at some point, it can't do a rollback, because of unknown branching point.

This is my understanding of the problem, but I could mislead something.

I catch this error very often. Two or more nodes are needed. It raises all the time if block generation is fast. I have 10s block interval. Here is my config:

miner {
    offlineGeneration = true
    targetBlockDelay = 10s
    blockGenerationDelay = 100ms
    rParamX10 = 8
    initialDifficulty = 10
    posAttachmentSize = 100
  }

..............................

PosForger:
val InitialDifficuly = 1500000000L

Note that I decreased PoS initial difficulty to speed up block generation

dkaidalov commented 6 years ago

@daron666 actually I was able to reproduce this crash without any changes in configs and with clean master branch (I run 3 nodes) I also noticed that the mentioned exception usually appears after the next error:

java.lang.IllegalArgumentException: requirement failed: Incorrect state version: 5wySxM4eYTLGFbboEeVYoKyPvu3u5C3fScuHoNAxm6Km found, (B78JEtziqmwttysEiab1KRNZ3oaSUYpUQvubfY5KTy1w || 87Phg3xSAwA58cRTZ1n4zTKmZwsjqkKwcijnmmspjsmA || List()) expected
    at scala.Predef$.require(Predef.scala:277)
    at examples.hybrid.state.HBoxStoredState.$anonfun$validate$1(HBoxStoredState.scala:56)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    at scala.util.Try$.apply(Try.scala:209)
    at examples.hybrid.state.HBoxStoredState.validate(HBoxStoredState.scala:51)
    at examples.hybrid.state.HBoxStoredState.validate(HBoxStoredState.scala:22)
    at scorex.mid.state.BoxMinimalState.applyModifier(BoxMinimalState.scala:29)
    at scorex.mid.state.BoxMinimalState.applyModifier$(BoxMinimalState.scala:28)
    at examples.hybrid.state.HBoxStoredState.applyModifier(HBoxStoredState.scala:22)
    at scorex.core.NodeViewHolder.updateState(NodeViewHolder.scala:248)
    at scorex.core.NodeViewHolder.pmodModify(NodeViewHolder.scala:284)
    at scorex.core.NodeViewHolder.pmodModify$(NodeViewHolder.scala:271)
    at examples.hybrid.HybridNodeViewHolder.pmodModify(HybridNodeViewHolder.scala:22)
    at scorex.core.NodeViewHolder$$anonfun$processLocallyGeneratedModifiers$1.applyOrElse(NodeViewHolder.scala:380)
....

To facilitate reproduction you can:

  1. Set up all nodes with non-zero balances (by setting seed = "genesisoX") so they can issue PoSBlocks
  2. Decrease block delay
  3. Decrease initial Pos difficulty
kushti commented 6 years ago

@dkaidalov I guess I've fixed it, test please

dkaidalov commented 6 years ago

@kushti Subjectively it becomes a bit more stable, but the same errors are still reproducible I noticed that HybridHistory::bestForkChanges still returns a ProgressInfo structure with toApply field containing only the head block instead of the whole applyBlocks array. Is that done on purpose? Cause I can confirm, according to my testing, that the IncorrectStateVersion exceptions start to appear right after applyBlocks.size > 1 has happened. And it then finally leads to Version Id not found exception