Closed jerryyip closed 2 years ago
I don't see anything in these logs to suggest that the miner is crashing, can you clarify what you mean?
Sorry @evanmcc , I have just made the log more clear. You can find that the miner restarts at 2021-08-28 15:46:37.260
Any update @evanmcc? Many users report symptoms that may be caused by this issue on Discord
I assume that this is an OOM issue? is there a time-correlated OOM message in /var/log/messages?
Otherwise I don't have any clues from these logs. We know that there are some issues around memory use in the state channel stuff. If this is indeed the case, it's a substantial rework, so we'll need to spend some time on it, meaning that it won't happen for a few weeks. A shorter-term solution might be a throttle on packets.
@jerryyip: can you check that please? Given that Sensecap does not give the owners the login credential to the miner I cannot check.
The crash.log from the device:
2021-09-10 07:54:47 =SUPERVISOR REPORT====
Supervisor: {<0.1805.0>,libp2p_simple_sup}
Context: shutdown_error
Reason: noproc
Offender: [{pid,<0.1809.0>},{id,1},{mfargs,{libp2p_yamux_stream,open_stream,undefined}},{restart_type,temporary},{shutdown,1000},{child_type,worker}]
2021-09-10 07:54:53 =SUPERVISOR REPORT====
Supervisor: {<0.1825.0>,libp2p_simple_sup}
Context: shutdown_error
Reason: noproc
Offender: [{pid,<0.1828.0>},{id,1},{mfargs,{libp2p_yamux_stream,open_stream,undefined}},{restart_type,temporary},{shutdown,1000},{child_type,worker}]
2021-09-10 07:56:13 =SUPERVISOR REPORT====
Supervisor: {<0.1929.0>,libp2p_simple_sup}
Context: shutdown_error
Reason: noproc
Offender: [{pid,<0.1932.0>},{id,1},{mfargs,{libp2p_yamux_stream,open_stream,undefined}},{restart_type,temporary},{shutdown,1000},{child_type,worker}]
Note that the device has 4GB ram, so there is no OOM message appears in the host. Is there a way to check the miner container or the miner's VM? @evanmcc
those are unrelated.
2021-08-28 15:45:07.320 160 [info] <0.1491.0>@blockchain_worker:start_sync:758 new block sync starting with Pid: <0.4209.0>, Ref: #Ref<0.3948368610.302252033.53793>, Peer: "/p2p/11u9D6Abckexf5pnQ...
============= miner crashes and restarts ===========================
2021-08-28 15:46:37.260 169 [info] <0.1443.0>@blockchain_sup:init:71 blockchain_sup init with [{key,{{ecc_compact,{{'ECPoint',<<4,45,171,167,233,117,136,78,18,127,57,202,65,250,25,152,233,231,...
the 160 and 169 here do indicate a crash, though (those are the OS PID). That said, I still have nothing to go on. If the erlang code itself was crashing, the logs at the point of restart would be noisier. So this is either an OOM kill or a segfault in the VM.
Can you enable core dumps, if there's no evidence of an OOM kill?
Also interesting here is the full 1.5 minutes that pass between the last message and the restart. What was going on there, I wonder?
@jerryyip: can you please provide the logs or give me the login credentials to my hotspot? Other non-Sensecap hotspots in the vicinity work fine, which may suggest that this is indeed a Sensecap issue.
I don't know who is responsible for the software of the hotspots but if Sensecap has configured the VM then this might be something to look at, as mentioned @evanmcc .
Going to close this as stale for now, @jerryyip. Feel free to comment to reopen if this is still an issue and you have more details to provide.
We found a situation on serval devices: The pktfwd of these devices are receiving too much lorawan packets, almost 1 packet per second. These packets are pushed to the miner, and crash the miner every 30 seconds. Here are the logs from the miner: