mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 991 forks source link

Stuck in tight loop processing orphan #1065

Closed bladedoyle closed 6 years ago

bladedoyle commented 6 years ago
# git branch
* master
# cat .git/refs/heads/master
4121ea124092081148273ec5846397cc001d7fa2
May 13 17:38:37.892 DEBG handle_payload: CompactBlock: c9776b51
May 13 17:38:37.892 DEBG Received compact_block c9776b51 at 65607 from 10.32.0.1:13414, going to process.
May 13 17:38:37.892 DEBG pipe: process_block c9776b51 at 65607 with 0 inputs, 1 outputs, 1 kernels
May 13 17:38:37.893 DEBG process_block: orphan: c9776b51, # orphans 1
...
...
May 13 17:47:57.228 DEBG process_block: orphan: c9776b51, # orphans 2

Here is a link to more log: https://ipfs.io/ipfs/Qmeiz8D1Mq5kZEA4v6oweNGEquKRb3RdDLFg1cuM78BrLh

Warning, its very large - 209M. I dont see any new info at the bottom so downloading the first part is probably enough.

ignopeverell commented 6 years ago

Any scenario to help reproducing this would be helpful.

bladedoyle commented 6 years ago

It may be worth mentioning that this was running inside a docker container inside Kubernetes. The network overlay uses NAT and seems to have the unfortunate side effect of making it appear as though every inbound connection is from the gateway address. 10.32.0.1 in this case.

This is an archive node that accepts p2p connections. It had been up about 4 days. I was not interacting with it at the time. Seems like it had a healthy number of peers:

May 13 17:41:56.041 DEBG monitor_peers: 9 connected (7 most_work). all 230 = 15 healthy + 0 banned + 215 defunct

garyyu commented 6 years ago

@bladedoyle Could you please tell me how to produce this table which you mentioned in Gitter? Thanks.

bladedoyle commented 6 years ago

@garyyu That is the output of a mysql slect command where I have recorded the details (height, hash, nonce) of each block my miner finds. Then, at a later time for each block I found I check it against the blockchain.

1) When the grin stratum server submits a "valid" (what it thinks is valid based on the chain it has at that time) block it logs it to the grin.log file.

$ grep "Found POW" grin.log | head -n 1
May 22 11:04:06.881 INFO (Server ID: StratumServer) Found POW for block with hash 1f1f1361 at height 78685 using nonce 11808040649676667891 submitted by worker http://192.168.1.100:13415

2) You can query a grin node for a block by height using this command:

 curl -s http://localhost:13413/v1/blocks/78685 | jq . | grep '"hash"'
    "hash": "456ebbe61ee0c86c068efcc06d67ff64d60fbbbf74e5044bc5a9c6e33d5b7ccd",

If those dont match, then it was an orphan:

mysql> select * from pool_blocks where height=78685;
+--------+----------+----------------------+--------+------+-----------+----------------------------+
| height | hash     | nonce                | state  | pow  | timestamp | found_by                   |
+--------+----------+----------------------+--------+------+-----------+----------------------------+
|  78685 | 1f1f1361 | 11808040649676667891 | orphan | NULL | NULL      | http://192.168.1.100:13415 |
+--------+----------+----------------------+--------+------+-----------+----------------------------+
1 row in set (0.00 sec)

Some orphans are normal. But when a mining node is constantly lagging far behind on the blockchain it mines on top of an old block and mines mostly orphans. It is the lagging far behind that this bug is about. It seems like maybe something goes wrong in the "gossip" protocol?

garyyu commented 6 years ago

@bladedoyle thanks for the info. Before looking what's wrong, I'm trying to reproduce the problem.

bladedoyle commented 6 years ago

@garyyu Ugh, I confused two different issues. My comments above about a node lagging behind the head block does not apply here, that is a separate problem. This issue is about a grin node getting stuck in a tight loop while trying to process an orphan.

yeastplume commented 6 years ago

Has anyone been able to reproduce this?

alice-bob-dlfs commented 6 years ago

I'm probably seeing the same issue.

The text UI shows

Current Status: Syncing - Latest header: 125238
Connected Peers: 17
Chain Height: 0
Total Difficulty: 1000

I've checked the 125238 number with grinscan.net and confirmed that it's up-to-date. But the zero chain height looks suspicious.

And from the grin.log

...
Jun 28 18:12:23.970 DEBG process_block: orphan: 88d2e078, # orphans 55
Jun 28 18:12:23.975 DEBG process_block: orphan: 64ecbec3, # orphans 55
Jun 28 18:12:24.189 DEBG process_block: orphan: 64ecbec3, # orphans 55
Jun 28 18:12:24.385 DEBG process_block: orphan: 64ecbec3, # orphans 55
Jun 28 18:12:41.685 DEBG process_block: orphan: f50e6c63, # orphans 56
Jun 28 18:12:42.621 DEBG process_block: orphan: f50e6c63, # orphans 56
Jun 28 18:13:04.329 DEBG process_block: orphan: 0ba1f5de, # orphans 57
Jun 28 18:13:04.359 DEBG process_block: orphan: 0ba1f5de, # orphans 57
Jun 28 18:13:04.739 DEBG process_block: orphan: 0ba1f5de, # orphans 57
Jun 28 18:13:04.744 DEBG process_block: orphan: 0ba1f5de, # orphans 57
Jun 28 18:13:04.881 DEBG process_block: orphan: f50e6c63, # orphans 57
Jun 28 18:13:04.917 DEBG process_block: orphan: 0ba1f5de, # orphans 57
Jun 28 18:13:05.148 DEBG process_block: orphan: 0ba1f5de, # orphans 57

and

Jun 28 18:19:53.633 DEBG locate_headers: [55b162da, 3301bdfb, 609e6218, 8bce5d54, 39af4bb9, 6231e39d, 6f08c67c, b41aa440, 9096becc, 2e7ab62a, 4b8a31f4, 403db755, 6695a5d1, af344809, cb7d8f6c, a1ddd4ff, 3346f63c]
Jun 28 18:19:53.634 DEBG locate_headers: common header: 3346f63c
Jun 28 18:19:53.634 DEBG locate_headers: returning headers: 0
Jun 28 18:19:58.319 DEBG handle_payload: CompactBlock: 0b8bdce7
Jun 28 18:19:58.320 DEBG Received compact_block 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:58.320 DEBG pipe: process_block 0b8bdce7 at 125253 with 0 inputs, 1 outputs, 1 kernels
Jun 28 18:19:58.320 DEBG process_block: orphan: 0b8bdce7, # orphans 70
Jun 28 18:19:58.625 DEBG Received block header 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:58.626 DEBG pipe: process_block_header at 125253 [0b8bdce7]
Jun 28 18:19:58.626 DEBG Requesting compact block 0b8bdce7 from [omit]:13414
Jun 28 18:19:58.721 DEBG Received block header 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:58.721 DEBG pipe: process_block_header at 125253 [0b8bdce7]
Jun 28 18:19:58.721 DEBG Requesting compact block 0b8bdce7 from [omit]:13414
Jun 28 18:19:58.897 DEBG handle_payload: CompactBlock: 0b8bdce7
Jun 28 18:19:58.898 DEBG Received compact_block 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:58.898 DEBG pipe: process_block 0b8bdce7 at 125253 with 0 inputs, 1 outputs, 1 kernels
Jun 28 18:19:58.898 DEBG process_block: orphan: 0b8bdce7, # orphans 70
Jun 28 18:19:58.899 DEBG handle_payload: CompactBlock: 0b8bdce7
Jun 28 18:19:58.899 DEBG Received compact_block 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:58.899 DEBG pipe: process_block 0b8bdce7 at 125253 with 0 inputs, 1 outputs, 1 kernels
Jun 28 18:19:58.899 DEBG process_block: orphan: 0b8bdce7, # orphans 70
Jun 28 18:19:59.035 DEBG handle_payload: CompactBlock: 0b8bdce7
Jun 28 18:19:59.035 DEBG Received compact_block 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:59.036 DEBG pipe: process_block 0b8bdce7 at 125253 with 0 inputs, 1 outputs, 1 kernels
Jun 28 18:19:59.036 DEBG process_block: orphan: 0b8bdce7, # orphans 70
Jun 28 18:19:59.166 DEBG Received block header 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:59.166 DEBG pipe: process_block_header at 125253 [0b8bdce7]
Jun 28 18:19:59.166 DEBG Requesting compact block 0b8bdce7 from [omit]:13414
Jun 28 18:19:59.255 DEBG handle_payload: CompactBlock: 0b8bdce7
Jun 28 18:19:59.256 DEBG Received compact_block 0b8bdce7 at 125253 from [omit]:13414, going to process.
Jun 28 18:19:59.256 DEBG pipe: process_block 0b8bdce7 at 125253 with 0 inputs, 1 outputs, 1 kernels
Jun 28 18:19:59.256 DEBG process_block: orphan: 0b8bdce7, # orphans 70
bladedoyle commented 6 years ago

I have not seen this issue recently. Its possible that the commits around July 20 for block sync issues fixed this. Possibly: https://github.com/mimblewimble/grin/pull/1281 by @garyyu

hashmap commented 6 years ago

Closing, feel free to reopen