mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 992 forks source link

Grin nodes often get stuck #3735

Closed mayong82 closed 1 year ago

mayong82 commented 1 year ago

The grin node is often stuck, which makes the API inaccessible. When accessing the foreign API interface, it waits for no data to be returned;

phyro commented 1 year ago

Hi @mayong82, do you have any estimate how long it takes for them to become inaccessible? Is the time to get to this state the same every time?

mayong82 commented 1 year ago

There is no specific statistics, and the time is not fixed. Sometimes when I get stuck, I restart it. It may get stuck again in a few hours or days, but sometimes it gets stuck again in a few minutes after restarting;

mayong82 commented 1 year ago

When this problem occurred before, I used v5.1.2. I thought it was a version problem. Then I upgraded the node to v5.2.0-alpha In version 1, this problem still exists. Even after a node is stuck, it cannot be recovered even after waiting for a long time;

phyro commented 1 year ago

Have you tried running it in DEBUG mode and see if there are any useful logs around he time it stops responding?

mayong82 commented 1 year ago

Now we encounter this problem again. The log output is as follows: 20220912 21:46:37.701 DEBUG grin_p2p::peers - Saving newly connected peer 89.58.0.69:0. 20220912 21:46:37.701 DEBUG grin_p2p::store - save_peer: PeerAddr(89.58.0.69:0) marked Healthy 20220912 21:46:48.247 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 18 connected (4 most_work). all 45300 = 30100 healthy + 1 banned + 15199 defunct 20220912 21:47:08.256 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 18 connected (4 most_work). all 45300 = 30100 healthy + 1 banned + 15199 defunct 20220912 21:47:10.145 DEBUG grin_p2p::peer - accept: handshaking from Ok(47.75.163.155:28497) 20220912 21:47:10.145 DEBUG grin_p2p::peers - Adding newly connected peer 47.75.163.155:3414. 20220912 21:47:10.145 DEBUG grin_p2p::peers - Saving newly connected peer 47.75.163.155:3414. 20220912 21:47:10.145 DEBUG grin_p2p::store - save_peer: PeerAddr(47.75.163.155:3414) marked Healthy 20220912 21:47:28.267 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 19 connected (4 most_work). all 45300 = 30100 healthy + 1 banned + 15199 defunct 20220912 21:47:40.722 DEBUG grin_servers::common::adapters - locator: [00038cb1d658, 0005105b2ed2, 00056141229e, 000361314845, 000675588ba8, 0003709bc4b6, 0000e2a94f15, 00067869e21a, 00015daf16ee, 0004204c1862, 0002607895f0, 0005ca5204f7, 00065d0779e7, 000a3712d20b, 0003a7b8d3d9, 0002df0e8eae, 0003c5ddaa58, 00002dce7581, 0000010c934f, 40adad0aec27] 20220912 21:47:48.276 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 19 connected (4 most_work). all 45300 = 30100 healthy + 1 banned + 15199 defunct 20220912 21:48:08.287 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 19 connected (4 most_work). all 45300 = 30100 healthy + 1 banned + 15199 defunct 20220912 21:48:10.944 DEBUG grin_p2p::peer - accept: handshaking from Ok(5.161.50.47:54612) 20220912 21:48:10.944 DEBUG grin_p2p::peers - Adding newly connected peer 5.161.50.47:3414. 20220912 21:48:10.944 DEBUG grin_p2p::peers - Saving newly connected peer 5.161.50.47:3414. 20220912 21:48:10.944 DEBUG grin_p2p::store - save_peer: PeerAddr(5.161.50.47:3414) marked Healthy 20220912 21:48:13.590 DEBUG grin_p2p::peer - accept: handshaking from Ok(32.210.25.95:51648) 20220912 21:48:13.601 DEBUG grin_p2p::peers - Adding newly connected peer 32.210.25.95:3414. 20220912 21:48:13.601 DEBUG grin_p2p::peers - Saving newly connected peer 32.210.25.95:3414. 20220912 21:48:13.601 DEBUG grin_p2p::store - save_peer: PeerAddr(32.210.25.95:3414) marked Healthy 20220912 21:48:18.288 DEBUG grin_p2p::conn - try_break: exit the loop: Connection(Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }) 20220912 21:48:18.288 DEBUG grin_p2p::conn - Shutting down writer connection with ? 20220912 21:48:24.085 DEBUG grin_servers::common::hooks - Received block header 00047508ac27 at 1916643 from 5.161.50.47:3414, going to process. 20220912 21:48:28.297 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 21 connected (1 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:48:28.297 DEBUG grin_p2p::peers - Error pinging peer PeerAddr(47.75.163.155:3414): Send("try_send disconnected") 20220912 21:48:28.297 DEBUG grin_p2p::peer - Stopping peer PeerAddr(47.75.163.155:3414) 20220912 21:48:48.307 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 20 connected (1 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:49:08.317 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 20 connected (1 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:49:18.318 DEBUG grin_p2p::conn - try_break: exit the loop: Connection(Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }) 20220912 21:49:18.318 DEBUG grin_p2p::conn - Shutting down writer connection with ? 20220912 21:49:26.290 DEBUG grin_p2p::peer - accept: handshaking from Ok(96.242.227.150:59079) 20220912 21:49:26.290 DEBUG grin_p2p::peers - Adding newly connected peer 96.242.227.150:3414. 20220912 21:49:26.290 DEBUG grin_p2p::peers - Saving newly connected peer 96.242.227.150:3414. 20220912 21:49:26.290 DEBUG grin_p2p::store - save_peer: PeerAddr(96.242.227.150:3414) marked Healthy 20220912 21:49:28.327 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 21 connected (1 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:49:28.327 DEBUG grin_p2p::peers - Error pinging peer PeerAddr(5.161.50.47:3414): Send("try_send disconnected") 20220912 21:49:28.327 DEBUG grin_p2p::peer - Stopping peer PeerAddr(5.161.50.47:3414) 20220912 21:49:48.337 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 20 connected (4 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:50:08.375 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 20 connected (4 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:50:28.385 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 20 connected (4 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:50:44.235 DEBUG grin_p2p::peer - accept: handshaking from Ok(195.154.113.17:59656) 20220912 21:50:44.235 DEBUG grin_p2p::peers - Adding newly connected peer 195.154.113.17:3414. 20220912 21:50:44.235 DEBUG grin_p2p::peers - Saving newly connected peer 195.154.113.17:3414. 20220912 21:50:44.235 DEBUG grin_p2p::store - save_peer: PeerAddr(195.154.113.17:3414) marked Healthy 20220912 21:50:48.395 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 21 connected (4 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:50:56.891 DEBUG grin_servers::common::adapters - locator: [00038cb1d658, 0005105b2ed2, 00056141229e, 000361314845, 000675588ba8, 0003709bc4b6, 0000e2a94f15, 00067869e21a, 00015daf16ee, 0004204c1862, 0002607895f0, 0005ca5204f7, 00065d0779e7, 000a3712d20b, 0003a7b8d3d9, 0002df0e8eae, 0003c5ddaa58, 00002dce7581, 0000010c934f, 40adad0aec27] 20220912 21:51:08.405 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 21 connected (4 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:51:18.406 DEBUG grin_p2p::conn - try_break: exit the loop: Connection(Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }) 20220912 21:51:18.406 DEBUG grin_p2p::conn - Shutting down writer connection with ? 20220912 21:51:28.415 DEBUG grin_servers::grin::seed - monitor_peers: on 0.0.0.0:3414, 21 connected (4 most_work). all 45300 = 30101 healthy + 1 banned + 15198 defunct 20220912 21:51:28.415 DEBUG grin_p2p::peers - Error pinging peer PeerAddr(195.154.113.17:3414): Send("try_send disconnected") 20220912 21:51:28.415 DEBUG grin_p2p::peer - Stopping peer PeerAddr(195.154.113.17:3414)

phyro commented 1 year ago

@mayong82 could you build from the latest pibd_impl branch and see if you hit this issue on this branch? We're trying to figure out if a certain patch resolves the issue you're seeing.

mayong82 commented 1 year ago

ok, i'll try it

mayong82 commented 1 year ago

It has been running for a period of time, so far, there is no jam problem, it seems that

phyro commented 1 year ago

thanks for testing @mayong82 ! I'll close this now so let me know if it should be reopened :)