Closed ghost closed 3 years ago
Add --time 360
to your stop/restart scripts (for docker). If that won't help, you have most probably hit same issue like me: https://github.com/gohornet/hornet/issues/554
@g574 do you still have that problem with latest releases?
Yes, I just tried again:
2020-08-24T09:54:28Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 3 seconds) to finish processing (send queue <ip>, send queue <ip>, send queue <ip>, TangleProcessor[ProcessMilestone], Tipselection[Cleanup], Peering Server, PendingRequestsEnqueuer, BroadcastQueue, MessageProcessor, Close database, STINGRequester, send queue <ip>, TangleProcessor[MilestoneSolidifier], Cleanup at shutdown, Tangle[HeartbeatEvents], Tangle[SolidifierGossipEvents], send queue <ip>, WarpSync[Events], send queue <ip>, Peering Reconnect, TangleProcessor[ReceiveTx], Tipselection[Events]) ...
2020-08-24T09:54:29Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 2 seconds) to finish processing (TangleProcessor[MilestoneSolidifier], Cleanup at shutdown, Tangle[HeartbeatEvents], Tangle[SolidifierGossipEvents], send queue <ip>, WarpSync[Events], send queue <ip>, Peering Reconnect, TangleProcessor[ReceiveTx], Tipselection[Events], send queue <ip>, send queue <ip>, TangleProcessor[ProcessMilestone], send queue <ip>, Tipselection[Cleanup], Peering Server, PendingRequestsEnqueuer, BroadcastQueue, MessageProcessor, Close database, STINGRequester, send queue <ip>) ...
2020-08-24T09:54:30Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 1 seconds) to finish processing (Peering Server, PendingRequestsEnqueuer, BroadcastQueue, MessageProcessor, Close database, STINGRequester, send queue <ip>, TangleProcessor[MilestoneSolidifier], Cleanup at shutdown, Tangle[HeartbeatEvents], Tangle[SolidifierGossipEvents], send queue <ip>, WarpSync[Events], send queue <ip>, Peering Reconnect, TangleProcessor[ReceiveTx], Tipselection[Events], send queue <ip>, send queue <ip>, TangleProcessor[ProcessMilestone], send queue <ip>, Tipselection[Cleanup]) ...
2020-08-24T09:54:31Z FATAL Graceful Shutdown gracefulshutdown/plugin.go:50 Background processes did not terminate in time! Forcing shutdown ...
github.com/gohornet/hornet/plugins/gracefulshutdown.configure.func1.1
/__w/hornet/hornet/plugins/gracefulshutdown/plugin.go:50
Now I'm on version 0.5.0
And the shutdown log was already from 0.5.0?
Yes, the log from today is from 0.5.0
@g574 could you please try to create a "full goroutine stack dump" in the moment you see these messages:
2020-08-24T09:54:30Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 1 seconds) to finish processing (Peering Server, PendingRequestsEnqueuer, BroadcastQueue, MessageProcessor, Close database, STINGRequester, send queue <ip>, TangleProcessor[MilestoneSolidifier], Cleanup at shutdown, Tangle[HeartbeatEvents], Tangle[SolidifierGossipEvents], send queue <ip>, WarpSync[Events], send queue <ip>, Peering Reconnect, TangleProcessor[ReceiveTx], Tipselection[Events], send queue <ip>, send queue <ip>, TangleProcessor[ProcessMilestone], send queue <ip>, Tipselection[Cleanup]) ..
To do that, access http://localhost:6060/debug/pprof/ (change the profiling.bindAddress to 0.0.0.0 in the config and forward the 6060 port outside of your container).
"full goroutine stack dump" => CTRL+A, CTRL+C => https://ybin.me/ => CTRL+V => CTRL+S => copy link into this issue
@muXxer I have tried to get the stack dump but now I cannot reproduce the issue: it shuts down properly:
2020-08-25T07:40:04Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 126 seconds) to finish processing (Close database) ...
2020-08-25T07:40:05Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 125 seconds) to finish processing (Close database) ...
2020-08-25T07:40:06Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 124 seconds) to finish processing (Close database) ...
2020-08-25T07:40:07Z INFO Database database/plugin.go:51 Syncing databases to disk... done
2020-08-25T07:40:07Z INFO Node node/node.go:124 Shutdown complete!
However, it does not synchronise up to the latest milestone:
2020-08-25T07:41:13Z INFO Tangle tangle/milestones.go:31 Valid milestone detected! Index: 1587698, Hash: T9TLNAWZOSNXZZAETAS9IZHQJQXWAMA9PQJO9XFDGDCRBBC9N9WJHURMNDKOFVIOYUPRZWUYCEIK99999
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1580399/1587698, TPS (in/new/out): 00150/00062/00063, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1580399/1587698, TPS (in/new/out): 00146/00071/00076, Tips (non-/semi-lazy): 0/0
Is it possible that it only cannot shut down gracefully if it is up to date, i.e. LSMI=LMI? Can you also help me figure out why it knows the last milestone but does not catch up? I am sure it would synchronise if I deleted the data but for me it is not an option to delete it every time I restart it.
@g574 pls disable "WarpSync" plugin by adding it to "disabledPlugins" and try to sync again. There was a race condition which we fixed in the latest RC.
@g574 btw, can you share some parts of your config? For example the enabled and disabled plugins? I'm wondering if the node gets stuck because of one of the plugins.
hi @muXxer , I have added WarpSync to disabledPlugins - there seems to be no change in the sync performance. Here's the config I have:
{
"useProfile": "auto",
"httpAPI": {
"basicAuth": {
"enabled": false,
"username": "",
"passwordHash": "",
"passwordSalt": ""
},
"excludeHealthCheckFromAuth": false,
"permitRemoteAccess": [
"attachToTangle",
"getNodeInfo",
"getBalances",
"checkConsistency",
"getTipInfo",
"getTransactionsToApprove",
"getInclusionStates",
"getNodeAPIConfiguration",
"wereAddressesSpentFrom",
"broadcastTransactions",
"findTransactions",
"storeTransactions",
"getTrytes"
],
"whitelistedAddresses": [],
"bindAddress": "0.0.0.0:9000",
"limits": {
"bodyLengthBytes": 1000000,
"findTransactions": 1000,
"getTrytes": 1000,
"requestsList": 1000
}
},
"dashboard": {
"bindAddress": "localhost:8081",
"theme": "default",
"basicAuth": {
"enabled": false,
"username": "",
"passwordHash": "",
"passwordSalt": ""
}
},
"tipsel": {
"belowMaxDepth": 15
},
"db": {
"path": "/data/hornet/mainnetdb",
"debug": false
},
"snapshots": {
"loadType": "local",
"local": {
"intervalSynced": 50,
"intervalUnsynced": 1000,
"path": "snapshots/mainnet/export.bin",
"downloadURLs": [
"https://ls.manapotion.io/export.bin",
"https://x-vps.com/export.bin",
"https://dbfiles.iota.org/mainnet/hornet/latest-export.bin"
]
},
"global": {
"path": "snapshotMainnet.txt",
"spentAddressesPaths": [
"previousEpochsSpentAddresses1.txt",
"previousEpochsSpentAddresses2.txt",
"previousEpochsSpentAddresses3.txt"
],
"index": 1050000
},
"pruning": {
"enabled": true,
"delay": 40000
}
},
"spentAddresses": {
"enabled": true
},
"network": {
"preferIPv6": false,
"gossip": {
"bindAddress": "0.0.0.0:15600",
"reconnectAttemptIntervalSeconds": 60
},
"autopeering": {
"bindAddress": "0.0.0.0:14626",
"runAsEntryNode": false,
"entryNodes": [
"FvfwJuCMoWJvcJLSYww7whPxouZ9WFJ55uyxTxKxJ1ez@enter.hornet.zone:14626",
"EkSLZ4uvSTED1x6KaGzqxoGxjbytt2rPVfbJk1LRLCGL@enter.manapotion.io:18626",
"iotaMk9Rg8wWo1DDeG7fwV9iJ41hvkwFX8w6MyTQgDu@enter.thetangle.org:14627",
"12w9FrzMdDQ42aBgFrv1siHuJMhuZ4SMVHRFSS7Zb72W@entrynode.iotatoken.nl:14626",
"DboTc1v61Xdyvggj8VRszy92ScUTLgfwZaHvXsU8zr7e@entrynode.tanglebay.org:14626",
"31Tz9meznQMm7qSDUgyMmYVeHUCGA7za5Suvbom5hpE9@bender.iota.autopeering.com:14626"
],
"seed": ""
}
},
"node": {
"alias": "",
"showAliasInGetNodeInfo": false,
"disablePlugins": [ "WarpSync" ],
"enablePlugins": []
},
"spammer": {
"address": "HORNET99INTEGRATED99SPAMMER999999999999999999999999999999999999999999999999999999",
"message": "Spamming with HORNET tipselect",
"tag": "HORNET99INTEGRATED99SPAMMER",
"tagSemiLazy": "",
"cpuMaxUsage": 0.8,
"tpsRateLimit": 0,
"bundleSize": 1,
"valueSpam": false,
"workers": 0,
"semiLazyTipsLimit": 30
},
"zmq": {
"bindAddress": "localhost:5556"
},
"profiling": {
"bindAddress": "localhost:6060"
},
"prometheus": {
"bindAddress": "localhost:9311",
"goMetrics": false,
"processMetrics": false,
"promhttpMetrics": false
}
}
So, you still can't sync? Or what is your problem now? Are will still talking about the shutdown problem? Sorry I am confused. Do you have neighbors?
I cannot sync right now:
2020-08-25T15:58:28Z INFO Autopeering autopeering/plugin.go:67 discovered: 159.69.106.11:14626 / Gby4qmhXSW1
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02408, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00212/00000/00271, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02462, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00287/00000/00270, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02514, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00238/00000/00239, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02571, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00245/00000/00256, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02643, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00216/00000/00253, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02700, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00294/00000/00353, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02762, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00239/00000/00266, Tips (non-/semi-lazy): 0/0
req(qu/pe/proc/lat): 00000/00591/01822/0049ms, reqQMs: 0, processor: 02834, LSMI/LMI: 1582635/1590680, TPS (in/new/out): 00243/00000/00329, Tips (non-/semi-lazy): 0/0
I have plenty of 2020-08-25T15:58:08Z INFO Autopeering autopeering/plugin.go:67 discovered: ...
entries in the log.
I can shut it down now:
2020-08-25T16:00:47Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 228 seconds) to finish processing (Cleanup at shutdown, Close database) ...
2020-08-25T16:00:48Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 227 seconds) to finish processing (Close database, Cleanup at shutdown) ...
2020-08-25T16:00:49Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 226 seconds) to finish processing (Close database, Cleanup at shutdown) ...
2020-08-25T16:00:50Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 225 seconds) to finish processing (Cleanup at shutdown, Close database) ...
2020-08-25T16:00:51Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 224 seconds) to finish processing (Cleanup at shutdown, Close database) ...
2020-08-25T16:00:52Z INFO Tangle tangle/plugin.go:115 Flushing caches to database... done
2020-08-25T16:00:52Z INFO Database database/plugin.go:49 Syncing databases to disk...
2020-08-25T16:00:52Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 223 seconds) to finish processing (Close database) ...
2020-08-25T16:00:53Z WARN Graceful Shutdown gracefulshutdown/plugin.go:48 Received shutdown request - waiting (max 222 seconds) to finish processing (Close database) ...
2020-08-25T16:00:54Z INFO Database database/plugin.go:51 Syncing databases to disk... done
2020-08-25T16:00:54Z INFO Node node/node.go:124 Shutdown complete!
So I'm afraid if it was up to date the ungraceful shutdown would be back.
Discovered entries are just info messages that you discovered other autopeers.
It looks like you "transaction processor" is processing a lot of messages. This could happen due to slow Disc IO or because the node takes a local snapshot right now. Did you wait a bit? What kind of storage do you have? SSD? Is this a VPN? We recognized a lot of problems with contabo for example, because their disc IO is too slow.
Hi @muXxer , thank you for the help. After disabling WarpSync and waiting for a few hours it started to synchronize again and it is up to date now. And I can shut it down gracefully. I will monitor the situation for a little more to be on the safe side.
You are welcome! If you ever have this shutdown problem again, take a full stacktrace, so we can identify the problem.
Stacktrace around 21:00 (too late for shutdown): https://ybin.me/p/85849b634c3f6c79#yv8w/gQh/nfcpYmh6sxZx9SSOV/a+/2YtJl7D2rNF7Q=
Logs: https://ybin.me/p/9ea2916afb714784#aNiyBR7bwrBbFOUcMOpF2u/j/UfnfbKWguysEGyj7tY=
Should be fixed in #650
Hi @muXxer , as I was troubleshooting the node for the other case I saw it was not synchronising. It went like this:
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1735062/1735152, TPS (in/new/out): 00096/00014/00029, Tips (non-/semi-lazy): 85/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1735062/1735152, TPS (in/new/out): 00065/00008/00003, Tips (non-/semi-lazy): 85/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1735062/1735152, TPS (in/new/out): 00065/00008/00003, Tips (non-/semi-lazy): 85/0
2020-09-11T11:58:11Z INFO Tangle Valid milestone detected! Index: 1735153, Hash: UPGRTJFMBVO9KDVLLUOSPIDREVKVTRYCNPGZVTBHFHYFIQ9PESOUSBZGNMOPONQXDPIFGSSP9SAY99999
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1735062/1735153, TPS (in/new/out): 00127/00021/00002, Tips (non-/semi-lazy): 85/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1735062/1735153, TPS (in/new/out): 00081/00015/00005, Tips (non-/semi-lazy): 85/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, LSMI/LMI: 1735062/1735153, TPS (in/new/out): 00085/00014/00003, Tips (non-/semi-lazy): 85/0
So I wanted to restart it, and I'm back to this issue again. After starting to stop it, the log says:
2020-09-11T11:58:14Z WARN Graceful Shutdown Received shutdown request - waiting (max 300 seconds) to finish processing ...
2020-09-11T11:58:14Z INFO Autopeering Stopping Autopeering ...
2020-09-11T11:58:14Z INFO Autopeering Stopping Autopeering ... done
2020-09-11T11:58:14Z INFO WebAPI Stopping WebAPI server ...
2020-09-11T11:58:14Z INFO WebAPI Stopping WebAPI server ... done
2020-09-11T11:58:14Z INFO PoW Stopping PoW Handler ...
2020-09-11T11:58:14Z INFO PoW Stopping PoW Handler ... done
2020-09-11T11:58:14Z INFO Snapshot Stopping LocalSnapshots...
2020-09-11T11:58:14Z INFO Snapshot Stopping LocalSnapshots... done
2020-09-11T11:58:14Z INFO Peering Stopping Reconnecter
2020-09-11T11:58:14Z INFO Peering Stopping Reconnecter ... done
2020-09-11T11:58:14Z INFO Peering Stopping Peering Server ...
2020-09-11T11:58:14Z INFO Peering disconnected <address>
2020-09-11T11:58:14Z INFO Peering disconnected <address>
2020-09-11T11:58:14Z INFO Peering disconnected <address>
2020-09-11T11:58:14Z INFO Peering disconnected <address>
2020-09-11T11:58:14Z INFO Peering disconnected <address>
2020-09-11T11:58:14Z INFO Peering disconnected <address>
2020-09-11T11:58:14Z INFO Peering Stopping Peering Server ... done
2020-09-11T11:58:14Z INFO Gossip Stopped MessageProcessor
2020-09-11T11:58:14Z INFO Gossip Stopped BroadcastQueue
2020-09-11T11:58:14Z INFO Tangle Stopping TangleProcessor[ReceiveTx] ...
2020-09-11T11:58:14Z INFO Tangle Stopping TangleProcessor[ReceiveTx] ... done
2020-09-11T11:58:14Z INFO Tangle Stopping TangleProcessor[ProcessMilestone] ...
2020-09-11T11:58:14Z INFO Tangle Stopping TangleProcessor[ProcessMilestone] ... done
2020-09-11T11:58:14Z INFO Tangle Stopping TangleProcessor[MilestoneSolidifier] ...
2020-09-11T11:58:15Z WARN Graceful Shutdown Received shutdown request - waiting (max 299 seconds) to finish processing (STINGRequester, TangleProcessor[MilestoneSolidifier], Tipselection[Cleanup], Tipselection[Events], Cleanu
p at shutdown, PendingRequestsEnqueuer, Close database) ...
2020-09-11T11:58:16Z WARN Graceful Shutdown Received shutdown request - waiting (max 298 seconds) to finish processing (PendingRequestsEnqueuer, Close database, STINGRequester, TangleProcessor[MilestoneSolidifier], Tipselecti
on[Cleanup], Tipselection[Events], Cleanup at shutdown) ...
2020-09-11T11:58:17Z WARN Graceful Shutdown Received shutdown request - waiting (max 297 seconds) to finish processing (PendingRequestsEnqueuer, Close database, STINGRequester, TangleProcessor[MilestoneSolidifier], Tipselecti
on[Events], Tipselection[Cleanup], Cleanup at shutdown) ...
2020-09-11T11:58:18Z WARN Graceful Shutdown Received shutdown request - waiting (max 296 seconds) to finish processing (Close database, STINGRequester, TangleProcessor[MilestoneSolidifier], Tipselection[Events], Tipselection[
Cleanup], Cleanup at shutdown, PendingRequestsEnqueuer) ...
.....
...
..
2020-09-11T12:03:12Z WARN Graceful Shutdown Received shutdown request - waiting (max 2 seconds) to finish processing (PendingRequestsEnqueuer, Close database, STINGRequester, TangleProcessor[MilestoneSolidifier], Tipselection
[Events], Tipselection[Cleanup], Cleanup at shutdown) ...
2020-09-11T12:03:13Z WARN Graceful Shutdown Received shutdown request - waiting (max 1 seconds) to finish processing (STINGRequester, TangleProcessor[MilestoneSolidifier], Tipselection[Events], Tipselection[Cleanup], Cleanup
at shutdown, PendingRequestsEnqueuer, Close database) ...
2020-09-11T12:03:14Z FATAL Graceful Shutdown Background processes did not terminate in time! Forcing shutdown ...
github.com/gohornet/hornet/plugins/gracefulshutdown.configure.func1.1
/__w/hornet/hornet/plugins/gracefulshutdown/plugin.go:50
My current config is:
{
"useProfile": "auto",
"httpAPI": {
"basicAuth": {
"enabled": false,
"username": "",
"passwordHash": "",
"passwordSalt": ""
},
"excludeHealthCheckFromAuth": false,
"permitRemoteAccess": [
"attachToTangle",
"getNodeInfo",
"getBalances",
"checkConsistency",
"getTipInfo",
"getTransactionsToApprove",
"getInclusionStates",
"getNodeAPIConfiguration",
"wereAddressesSpentFrom",
"broadcastTransactions",
"findTransactions",
"storeTransactions",
"getTrytes"
],
"whitelistedAddresses": [],
"bindAddress": "0.0.0.0:11111",
"limits": {
"bodyLengthBytes": 1000000,
"findTransactions": 1000,
"getTrytes": 1000,
"requestsList": 1000
}
},
"tipsel": {
"belowMaxDepth": 15
},
"db": {
"path": "/data/hornet/mainnetdb",
"debug": false
},
"logger": {
"level": "info",
"disableCaller": true,
"encoding": "console",
"outputPaths": [
"stdout"
]
},
"snapshots": {
"loadType": "local",
"local": {
"intervalSynced": 50,
"intervalUnsynced": 1000,
"path": "snapshots/mainnet/export.bin",
"downloadURLs": [
"https://ls.manapotion.io/export.bin",
"https://x-vps.com/export.bin",
"https://dbfiles.iota.org/mainnet/hornet/latest-export.bin"
]
},
"global": {
"path": "snapshotMainnet.txt",
"spentAddressesPaths": [
"previousEpochsSpentAddresses1.txt",
"previousEpochsSpentAddresses2.txt",
"previousEpochsSpentAddresses3.txt"
],
"index": 1050000
},
"pruning": {
"enabled": true,
"delay": 258000
}
},
"spentAddresses": {
"enabled": true
},
"network": {
"preferIPv6": false,
"gossip": {
"bindAddress": "0.0.0.0:15600",
"reconnectAttemptIntervalSeconds": 60
},
"autopeering": {
"bindAddress": "0.0.0.0:14626",
"runAsEntryNode": false,
"entryNodes": [
"FvfwJuCMoWJvcJLSYww7whPxouZ9WFJ55uyxTxKxJ1ez@enter.hornet.zone:14626",
"EkSLZ4uvSTED1x6KaGzqxoGxjbytt2rPVfbJk1LRLCGL@enter.manapotion.io:18626",
"iotaMk9Rg8wWo1DDeG7fwV9iJ41hvkwFX8w6MyTQgDu@enter.thetangle.org:14627",
"12w9FrzMdDQ42aBgFrv1siHuJMhuZ4SMVHRFSS7Zb72W@entrynode.iotatoken.nl:14626",
"DboTc1v61Xdyvggj8VRszy92ScUTLgfwZaHvXsU8zr7e@entrynode.tanglebay.org:14626",
"31Tz9meznQMm7qSDUgyMmYVeHUCGA7za5Suvbom5hpE9@bender.iota.autopeering.com:14626"
],
"seed": ""
}
},
"node": {
"alias": "",
"showAliasInGetNodeInfo": false,
"disablePlugins": [
"WarpSync",
"Dashboard",
"Graph",
"MQTT",
"Prometheus",
"Spammer",
"ZMQ"
],
"enablePlugins": []
},
"profiling": {
"bindAddress": "localhost:6060"
}
}
I'm currently on v0.5.3-rc3
Can you reproduce this? If yes, please create a full stack trace during the shutdown deadlock (like after 30s if it is not shutdown).
See here https://github.com/gohornet/hornet/issues/562#issuecomment-679326385
I tried again just now, but it was still doing the revalidation after the ungraceful shutdown, so it worked now:
2020-09-11T12:18:51Z INFO Tangle analyzed 166956 transactions
2020-09-11T12:18:53Z INFO Tangle analyzed 167165 transactions
2020-09-11T12:18:55Z INFO Tangle analyzed 167384 transactions
2020-09-11T12:18:57Z INFO Tangle analyzed 167562 transactions
2020-09-11T12:18:59Z INFO Tangle analyzed 167803 transactions
2020-09-11T12:19:00Z WARN Graceful Shutdown Received shutdown request - waiting (max 300 seconds) to finish processing ...
2020-09-11T12:19:01Z WARN Graceful Shutdown Received shutdown request - waiting (max 299 seconds) to finish processing ...
2020-09-11T12:19:01Z INFO Tangle analyzed 168023 transactions
2020-09-11T12:19:01Z INFO Tangle database revalidation aborted
I guess the revalidation is not the tricky part. I will try again later when it is up and running.
At revalidation, most parts of the node are not started yet :)
btw, it seems you have a reaaaally slow system. What are the specs? Are you using an SSD?
btw, it seems you have a reaaaally slow system. What are the specs? Are you using an SSD?
It's a 16 core, 8GB memory virtual machine on a bigger server system. ZFS Raid 5 local SSDs on that machine. The same ZFS pool also houses a few other blockchain nodes, there might be short-term usage spikes from those, but nothing continuous. We monitor that.
I'm really sorry... but I just received your stacktrace today from 9th of october. Why didn't you post it here directly. I'm not a part of the IF, so I can't help you if you write it in internal channels :(
The stacktrace shows that the database lib is busy writing something to the disc. So my assumption seems to be right, something is wrong with your disc IO. Can you do an IO benchmark with some linux tools on that VM? I guess we need to add benchmark tools to HORNET in the future to diagnose such "hardware problems" easier.
Question, do you have a lot of history in that database? How big is your mainnet database folder?
Hi @muXxer , no worries, I don't restart Hornet that often. Actually I haven't experienced this issue recently after I removed a few unnecessary processes from the machine. So your hint about IO makes a lot of sense now. About the history size I cannot say much as I had to remove the data after each failed restart. I close the case, I think it's OK now. Thanks for your help.
Describe the bug I run Hornet in a Docker container. I use my own image. I cannot stop Hornet in a graceful way. To Reproduce Steps to reproduce the behavior:
Then it takes several hours to validate the DB. Dropping the DB with every restart is definitely not an option for me.
Troubleshooting: Confirm that the hornet process is running with PID 1:
I have also tried to add
STOPSIGNAL 2
to the Dockerfile and specify a long stop-timeout:While
docker stop hornet
is running I can look at the logs and I see this:The ExitCode looks good to me:
Expected behavior I should be able to use
docker stop
to stop the container without causing a long downtime.Environment information: