input-output-hk / jormungandr

privacy voting blockchain node
https://input-output-hk.github.io/jormungandr/
Apache License 2.0
364 stars 132 forks source link

Block created but not accepted by the network without any information as to why #1472

Open mark-stopka opened 4 years ago

mark-stopka commented 4 years ago

Describe the bug I was scheduled for a block at

- created_at_time: "2019-12-30T12:29:37.841257060+00:00"
enclave_leader_id: 1
finished_at_time: "2019-12-30T12:36:17.006271651+00:00"
scheduled_at_date: "16.31280"
scheduled_at_time: "2019-12-30T12:36:17+00:00"
status: 
  Block:
    block: 2a4e3cfcf0e542ef3961b189562d21d1f74305d65f050a67914f044c7442d97a
chain_length: 55079
wake_at_time: "2019-12-30T12:36:17.003739280+00:00"

Block was produced as per log entries below; however is not included in the chain according to shelley eplorer; the parent block is.

[mark_stopka@nlgc-bltn-docker-host-01 ~]$ docker logs db4075d86859 2>&1 | grep 2a4e3cfcf0e542ef3961b189562d21d1f74305d65f050a67914f044c7442d97a
Dec 30 13:36:17.071 INFO receiving block from leadership service, date: 16.31280, parent: 915dd3b85c5a31406bbdaa65889827c90a5ede05b5d6c648e4bb5ae30c7464d9, hash: 2a4e3cfcf0e542ef3961b189562d21d1f74305d65f050a67914f044c7442d97a, task: block
Dec 30 13:36:17.128 INFO block from leader event successfully stored, date: 16.31280, parent: 915dd3b85c5a31406bbdaa65889827c90a5ede05b5d6c648e4bb5ae30c7464d9, hash: 2a4e3cfcf0e542ef3961b189562d21d1f74305d65f050a67914f044c7442d97a, task: block
Dec 30 13:36:17.128 INFO update current branch tip, date: 16.31280, parent: 915dd3b85c5a31406bbdaa65889827c90a5ede05b5d6c648e4bb5ae30c7464d9, hash: 2a4e3cfcf0e542ef3961b189562d21d1f74305d65f050a67914f044c7442d97a, task: block

Mandatory Information

  1. jcli --full-version output;

[mark_stopka@nlgc-bltn-docker-host-01 ~]$ docker exec -ti db4075d86859 jcli --full-version jcli 0.8.5 (HEAD-3db06807, release, linux [x86_64]) - [rustc 1.40.0 (73528e339 2019-12-16)]

2. `jormungandr --full-version` output;

[mark_stopka@nlgc-bltn-docker-host-01 ~]$ docker exec -ti db4075d86859 jormungandr --full-version jormungandr 0.8.5 (HEAD-3db06807, release, linux [x86_64]) - [rustc 1.40.0 (73528e339 2019-12-16)]



**Expected behavior**
Block will either be included in the blockchain or an information referencing the same block hash will be provided as to why the block was not accepted by the network.

**Additional context**
Slot `16.31280` was no filled by any other blocks. It seems the parent block [should've been different  with block from slot 16.31267](https://shelleyexplorer.cardano.org/en/block/e0c20e184a889798279c4beabd41f31b7c5fa60dbcbb7a6d69861f5006dbeded/), however once Jormungandr realizes this situation an information with a logging level **WARN** or higher should have been issued. Part of that information should also be the hash of the rejected block for easy event correlation.
stanfieldr commented 4 years ago

The solution to this issue may resolve #1446

JSCSJSCS commented 4 years ago

I have seen the exact same thing. I noticed that the block that was shown as created was also the one that was skipped during a cardano-wallet "Rolling back to" output. I can't find any documentation on this "Rolling back" output. I suspect it means I was on a bad chain at the time the block was produced, and it rolled back to the good chain time and my block was never produced on the good chain, so it does not show up in explorer?

If 16.35922 was my block that showed in leaders logs output as inblock, it would not show up in explorer:

Applying blocks [16.35904 ... 16.35904] Writing stake-distribution for epoch 16 Applying blocks [16.35922 ... 16.35922] Writing stake-distribution for epoch 16 Rolling back to 16.35904

lunarpool commented 4 years ago

Same happened several times to my node, one example below. Block created at 17.26809 and reported in jormungandr, but can not be find in Shelley explorer.

- created_at_time: "2019-12-31T08:55:09.953734800+00:00"
  enclave_leader_id: 1
  finished_at_time: "2019-12-31T10:07:15.014321900+00:00"
  scheduled_at_date: "17.26809"
  scheduled_at_time: "2019-12-31T10:07:15+00:00"
  status:
    Block:
      block: d8546fc3c1bbc8d9c6342cf8a7dc707ca1e2caeeb58b78d97d0cc8308d5d08b7
      chain_length: 57942
  wake_at_time: "2019-12-31T10:07:14.983082300+00:00"
TheRealMrBojangles commented 4 years ago

version 0.8.5 same same... node was on a hot streak uptime was 2days no blocks missed after this it fell out of sync had to restart...

JSCSJSCS commented 4 years ago

I have been monitoring the cardano-wallet server output for a couple of days and I am seeing this "Rolling back to" output that skips back a block about 80 times per epoch. That is about 2% of blocks getting rolled over and lost. Seems like a lot. Which there was more documentation on this event.

mark-stopka commented 4 years ago

@JSCSJSCS good job with the wallet, would you mind to redirect the output to a file (e.q. using tee) and attach a log for 24 hour period here?

lunarpool commented 4 years ago

There must be more causes, I have another example and do not believe to have so much luck to be in those 2% every day. Today's block was 18.31244

- created_at_time: "2020-01-01T12:27:11.204752500+00:00"
enclave_leader_id: "1"
finished_at_time: "2020-01-01T12:35:05.013711+00:00"
scheduled_at_date: "18.31244"
scheduled_at_time: "2020-01-01T12:35:05+00:00"
  status:
    Block:
    block : 810cc6151583be0ed25baaec92b859a1b1745f1ab483a48b59c9fccbf318833d
    chain_length : 61294
wake_at_time : "2020-01-01T12:35:04.998012100+00:00"

image

JSCSJSCS commented 4 years ago

@JSCSJSCS good job with the wallet, would you mind to redirect the output to a file (e.q. using tee) and attach a log for 24 hour period here?

@mark-stopka Will do.

JSCSJSCS commented 4 years ago

@mark-stopka I'm sorry, I forgot to add the -a to the tee and lost all the data every time I restarted cardano-wallet. I started a new process with tee -a this time, Will post it tomorrow. I counted 69 "Rolling back" from what I had though.

JSCSJSCS commented 4 years ago

Darn it. It keeps getting me. LOL

[cardano-wallet.stake-pools:Info:27] [2020-01-03 20:32:47.87 UTC] Applying blocks [21.2375 ... 21.2375]
[cardano-wallet.stake-pools:Info:27] [2020-01-03 20:32:47.87 UTC] Writing stake-distribution for epoch 21
[cardano-wallet.stake-pools:Info:27] [2020-01-03 20:33:01.43 UTC] Applying blocks [21.2382 ... 21.2382]  **<- My block**
[cardano-wallet.stake-pools:Info:27] [2020-01-03 20:33:01.43 UTC] Writing stake-distribution for epoch 21
[cardano-wallet.stake-pools:Info:27] [2020-01-03 20:33:03.45 UTC] Rolling back to 21.2375 **<- My block no more!**
[cardano-wallet.stake-pools:Info:27] [2020-01-03 20:33:03.45 UTC] Applying blocks [21.2381 ... 21.2383]
prometheus-pool commented 4 years ago

I just saw this issue as well, in my case I notice the slot was still added to Shelley Explorer but the block was produced by another pool. I was in sync at the time and according to my leader logs I created the block just after the schedule time


  enclave_leader_id: 1
  finished_at_time: "2020-01-04T04:03:53.005494923+00:00"
  scheduled_at_date: "21.15908"
  scheduled_at_time: "2020-01-04T04:03:53+00:00"
  status:
    Block:
      block: 1c02576b871e3c5f672d4e0fbe85a45ef303e68bb0af779f8f28433231c34f6d
      chain_length: 69245
  wake_at_time: "2020-01-04T04:03:53.000383428+00:00"```

Based on Shelley Explorer it looks like another pool produced the block at the same time or after my stake pool based on my leader logs. https://shelleyexplorer.cardano.org/en/block/115db8f77d387aab0ddb8a24ebcaecb42b02580257d06f93f0a445bf6b10d61e/
JSCSJSCS commented 4 years ago

@mark-stopka Finally! Here is a bit more than 24 hours of cardano-wallet serve logs. There are 105 "Rolling back to" entries over this period.

cardano-wallet.log

NicolasDP commented 4 years ago

This is going to be a bit difficult to implement and to have a good reporting of this. However I see where you come from @mark-stopka and indeed not having all the information available when managing your pool is frustrating.

The reason why a block is not included in the blockchain is varied and it is not necessarily because the block is not valid (there is little chance to create invalid blocks currently):

It is not easy to track this information but I will keep a note for the next devs to pick that up;


there is an object in the leadership service that keeps logs for the blocks to create or that were created (or if the slot time were missed for some reasons): the leadership log. This value could be updated if the created block is detected to be in a fork that is replaced from the Tip object.

  1. have a bool/value/something to mark thatthis block was created by this node;
  2. find the common ancestor of the 2 branch;
  3. on the Ref between common ancestor and tip of top of old branch.
    • update the associated LeaderLog (if still present) to say the Block has been dismissed because on a dismissed branch;
  4. on the Ref between common ancestor and tip of new branch:
    • update the associated LeaderLog (if still present) to say the Block has been included successfully in the blockchain.
JSCSJSCS commented 4 years ago

What about the 80-120 "Rolling back to" presumable fork corrections a day seen in the cardano-wallet logs? This can't be a normal or acceptable amount, could it? What issue can I tag onto this?

cffls commented 4 years ago

I am seeing more of this behavior in v0.8.6.

My node was up all the time. It generated 4 blocks recently, during which the active connections were about 3000, but all 4 blocks got invalidated. Here is the leadership log:

    {
      "created_at_time": "2020-01-17T19:13:52.653593988+00:00", 
      "enclave_leader_id": "1", 
      "finished_at_time": "2020-01-18T05:33:37.008394867+00:00", 
      "scheduled_at_date": "35.18600", 
      "scheduled_at_time": "2020-01-18T05:33:37+00:00", 
      "scheduled_date": "2020-01-18T05:33:37+00:00", 
      "status": {
        "Block": {
          "block": "954fe2828ebe9775ed2e1d2dd5e874bf30f104a223a2446ac291a6f2722a5856", 
          "chain_length": "108286"
        }
      }, 
      "wake_at_time": "2020-01-18T05:33:37.001425801+00:00"
    }, 
    {
      "created_at_time": "2020-01-17T19:13:52.653595907+00:00", 
      "enclave_leader_id": "1", 
      "finished_at_time": "2020-01-18T05:42:27.001824366+00:00", 
      "scheduled_at_date": "35.18865", 
      "scheduled_at_time": "2020-01-18T05:42:27+00:00", 
      "scheduled_date": "2020-01-18T05:42:27+00:00", 
      "status": {
        "Block": {
          "block": "d954dc91bf606d462e1cbced2f6c40a342f5d7a1dd2517db3478f2d488c971ff", 
          "chain_length": "108300"
        }
      }, 
      "wake_at_time": "2020-01-18T05:42:27.000564614+00:00"
    }, 
    {
      "created_at_time": "2020-01-17T19:13:52.653597497+00:00", 
      "enclave_leader_id": "1", 
      "finished_at_time": "2020-01-18T08:22:53.006211092+00:00", 
      "scheduled_at_date": "35.23678", 
      "scheduled_at_time": "2020-01-18T08:22:53+00:00", 
      "scheduled_date": "2020-01-18T08:22:53+00:00", 
      "status": {
        "Block": {
          "block": "5f76e1c0b1c7b19a0524147880c6f3c6bac99d065c6da160c7acf4c35c50395f", 
          "chain_length": "108596"
        }
      }, 
      "wake_at_time": "2020-01-18T08:22:53.000926830+00:00"
    }, 
    {
      "created_at_time": "2020-01-17T19:13:52.653599108+00:00", 
      "enclave_leader_id": "1", 
      "finished_at_time": "2020-01-18T08:41:37.004464095+00:00", 
      "scheduled_at_date": "35.24240", 
      "scheduled_at_time": "2020-01-18T08:41:37+00:00", 
      "scheduled_date": "2020-01-18T08:41:37+00:00", 
      "status": {
        "Block": {
          "block": "121fb837c95d121302398889fbf77fb83f1f34e2888ab108e32cd3b33a97b8be", 
          "chain_length": "108631"
        }
      }, 
      "wake_at_time": "2020-01-18T08:41:37.000771020+00:00"
    }, 
stakehodlers commented 4 years ago

Seeing the same thing in v0.8.6. Please see issue #1608 further details and data that may help diagnose this problem...