threefoldtech / test_feedback

Apache License 2.0
3 stars 0 forks source link

[Bug 🐞]: timeout on brpop - Error while farmerbot is trying to power off a node #413

Closed Randynho closed 5 months ago

Randynho commented 5 months ago

What happened?

One of my nodes didn't power off with the above mentioned error in the farmerbot logs.

The node started at 03:15 am and is still running 8 hours later.

Connection speed is fine, I have a VM on my always on node to check.

What did you expect?

A node that powers off after half an hour.

What browsers are you seeing the problem on?

No response

ZOS info

No response

Dashboard info

No response

weblets info

No response

Relevant log output

2024-01-15 11:09:25 [INFO ] [POWERMANAGER] Resource usage too low: 7. Turning off unused node 5643
2024-01-15 11:09:25 [DEBUG] Received job with guid 9cb5adfe-224a-4224-a20f-b469d0c8520d
2024-01-15 11:09:25 [DEBUG] Assigned job 9cb5adfe-224a-4224-a20f-b469d0c8520d to jobs.actors.farmerbot.powermanager:
2024-01-15 11:09:25 [DEBUG] jobs.ActionJob{
    guid: '9cb5adfe-224a-4224-a20f-b469d0c8520d'
    twinid: 0
    action: 'farmerbot.powermanager.poweroff'
    args: params.Params{
        params: [params.Param{
            key: 'nodeid'
            value: '5643'
        }]
        args: []
    }
    result: params.Params{
        params: []
        args: []
    }
    state: tostart
    start: 2024-01-15 11:09:25
    end: 1970-01-01 00:00:00
    grace_period: 0
    error: ''
    timeout: 0
    src_twinid: 0
    src_action: ''
    dependencies: []
}
2024-01-15 11:09:25 [INFO ] [POWERMANAGER] Executing job: POWEROFF 5643
2024-01-15 11:09:35 [ERROR] [POWERMANAGER] Job to power off node 5643 failed: timeout on brpop
2024-01-15 11:09:35 [INFO ] Elapsed time for update: 2.2195045666666666
2024-01-15 11:11:26 [DEBUG] Received error response for job with guid 9cb5adfe-224a-4224-a20f-b469d0c8520d
2024-01-15 11:11:26 [DEBUG] Returned job 9cb5adfe-224a-4224-a20f-b469d0c8520d
Randynho commented 5 months ago

After restarting the farmerbot, the node shut down immediatelly. All other were woken up.

2024-01-15 11:35:00 [INFO ] [POWERMANAGER] Resource usage too low: 7. Turning off unused node 5643 2024-01-15 11:35:00 [DEBUG] Received job with guid fe0d6b0a-285d-46bc-825e-d2ce145fcb55 2024-01-15 11:35:00 [DEBUG] Assigned job fe0d6b0a-285d-46bc-825e-d2ce145fcb55 to jobs.actors.farmerbot.powermanager: 2024-01-15 11:35:00 [INFO ] [POWERMANAGER] Executing job: POWEROFF 5643 2024-01-15 11:35:00 [DEBUG] jobs.ActionJob{ guid: 'fe0d6b0a-285d-46bc-825e-d2ce145fcb55' twinid: 0 action: 'farmerbot.powermanager.poweroff' args: params.Params{ params: [params.Param{ key: 'nodeid' value: '5643' }] args: [] } result: params.Params{ params: [] args: [] } state: tostart start: 2024-01-15 11:35:00 end: 1970-01-01 00:00:00 grace_period: 0 error: '' timeout: 0 src_twinid: 0 src_action: '' dependencies: [] }

2024-01-15 11:35:06 [DEBUG] Received result for job with guid fe0d6b0a-285d-46bc-825e-d2ce145fcb55 2024-01-15 11:35:06 [DEBUG] Returned job fe0d6b0a-285d-46bc-825e-d2ce145fcb55 2024-01-15 11:35:06 [INFO ] Elapsed time for update: 2.4280008 2024-01-15 11:39:46 [INFO ] [DATAMANAGER] Node 719 is ON.

TullysInc commented 5 months ago

Just received a similar report from a farmer running Farmerbot on Farm 45.

See errors: Farm ID 45_1 Farm ID 45_2 Farm ID 45_3

At this stage; he has resorted to manually booting the nodes, and some nodes appear to be working fine after the reboot.

ramezsaeed commented 5 months ago

All this already changed in the new farmerbot