threefoldtech / tfgrid-sdk-go

Apache License 2.0
2 stars 4 forks source link

🐞 [Bug]: Unexpected behavior of the Farmer bot #987

Open mahendravarmayadala93 opened 6 months ago

mahendravarmayadala93 commented 6 months ago

What happened?

Email : mahir1505@gmail.com

A client named Mahir, experienced power outage last week and suddenly the next day he started getting notifications "Node didn't wake up within 24 hours" After checking the logfiles.

Here are the observations :

RMB Error for multiple Nodes Multiple occurrence of this error "1:20PM DBG failed to read message error="read tcp 185.172.174.19:57726->178.63.191.177:443: use of closed network connection" Client says that the Issue with Closed network connection fixed automatically but the RMB error causing delay about 15 - 20 minutes which could the reason for "Node didn't wake up with in 24 hours"

image_z8lzd (1)

Log File : https://nc.mahir.be/index.php/s/jorXrHPpipaTA6C

Config File : https://nc.mahir.be/index.php/s/LxqqtsZPXenCdep

Here are the observations from @scottyeager after reviewing the logfiles

I have reviewed the log file, specifically for the case of node 4332. I don't see any reason why this node was not awoken earlier.

Here's the last node report table before the bot woke up the node on the 8th:

2024/04/08
11:25AM DBG Nodes report
│ 4332 │ OFF           │ false  │ true      │ false         │ 0%    │ false                 │               0 │ -                │     23h41m16.753860937s │ 24h17m49.505781287s │                        0s │
...
11:30AM INF Periodic wake up nodeID=4332
11:30AM INF POWER ON nodeID=4332

Farmerbot is already reporting that the last time the node was awake was over 24 hours ago, which doesn't seem correct. On the previous day:

2024/04/07

11:28AM DBG Nodes report
│ 4332 │ ON            │ false  │ true      │ false         │ 0%    │ false                 │               0 │ "10:51AM"        │          21m1.56875582s │     21m1.568755981s │                        0s │
11:38AM INF Resource usage too low. Turning off unused node nodeID=4332 resources usage=0
11:38AM INF POWER OFF nodeID=4332

So the questions is why is the bot waiting until an almost impossibly late time to try to wake up this node? I think we'll need dev input at this point.

As a final note, I don't see any evidence that the power outage or any RMB related errors are the cause of this. Certainly there could be something I'm not seeing, but it could also just be a coincidence.

Recent update : 1 Dated 16/04/2024 from the Farmer in specific to Node 5918.

This Node should have already awakened, but Farmer Bot scheduled it for 2:16 pm, and the power target changed to up at 9:47. Farmer is currently using Farmerbot version v0.14.12. According to the dashboard, the Node is still down. image_xc8po image_gfak7s

Recent update : 2 Dated 16/04/2024

Farmer says: "and yesterday it got the powertarget "up" at 9:47 Also farmer thinks that time between nodes initiated is longer than usual 5-10mins, thats why nodes boot to late.

As per the log file, there is an error read tcp 185.172.174.19:45176->178.63.191.177:443: use of closed network connection. Also rmb with error: context deadline.

  1. �[90m2:37PM�[0m DBG failed to read message �[36merror=�[0m�[31m�[1m"read tcp 185.172.174.19:45176->178.63.191.177:443: use of closed network connection"�[0m�[0m �[90m2:38PM�[0m �[31mERR�[0m �[36merror=�[0m�[31m�[1m"failed to update node 5751 with error: failed to get node 5751 statistics from rmb with error: context deadline exceeded"�[0m�[0m �[90m2:39PM�[0m �[32mINF�[0m �[1mResource usage too low. Turning off unused node�[0m �[36mnodeID=�[0m5811 �[36mresources usage=�[0m0 �[90m2:39PM�[0m DBG failed to read message �[36merror=�[0m�[31m�[1m"read tcp 185.172.174.19:57252->178.63.191.177:443: use of closed network connection"�[0m�[0m �[90m2:46PM�[0m DBG failed to read message �[36merror=�[0m�[31m�[1m"read tcp 185.172.174.19:41074->178.63.191.177:443: use of closed network connection"�[0m�[0m �[90m2:48PM�[0m �[31mERR�[0m �[36merror=�[0m�[31m�[1m"failed to update node 5751 with error: failed to get node 5751 statistics from rmb with error: context deadline exceeded"�[0m�[0m usage too low. Turning off unused node�[0m �[36mnodeID=�[0m5869 �[36mresources usage=�[0m1

Here is the latest log file here

which network/s did you face the problem on?

Main

Twin ID/s

No response

Node ID/s

4332,5918

Farm ID/s

No response

Contract ID/s

No response

Relevant log output

Submitted via links
xmonader commented 6 months ago

To be investigated