guydavis / machinaris

An easy-to-use WebUI for crypto plotting and farming. Offers Bladebit, Gigahorse, MadMax, Chiadog and Plotman in a Docker container. Supports Chia, MMX, Chives, Flax, and HDDCoin among others.
Apache License 2.0
337 stars 69 forks source link

Worker not checking in #219

Closed BasvanH closed 3 years ago

BasvanH commented 3 years ago

Describe the bug In a multi workers setup a noticed a stale harvester. I rebooted it, and now it wont check in at the full node. Even after purging he wont list at the workers. IP connectivity ext is in order.

Expected behavior At the harvester I get these logs:

Execution of job "update (trigger: interval[0:00:05], next run at: 2021-08-12 11:57:57 UTC)" skipped: maximum number of running instances reached (1)
Execution of job "update (trigger: interval[0:01:00], next run at: 2021-08-12 11:57:57 UTC)" skipped: maximum number of running instances reached (1)
Execution of job "update (trigger: interval[0:00:05], next run at: 2021-08-12 11:58:02 UTC)" skipped: maximum number of running instances reached (1)
[2021-08-12 11:58:05 +0000] [95] [INFO] Failed to load plots farming and send.
[2021-08-12 11:58:05 +0000] [95] [INFO] Traceback (most recent call last):
  File "/chia-blockchain/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/chia-blockchain/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/chia-blockchain/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.9/http/client.py", line 1345, in getresponse
    response.begin()
  File "/usr/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.9/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

And at the full node/farmer I get these:

WARNING:apscheduler.scheduler:Execution of job "update (trigger: interval[0:00:05], next run at: 2021-08-12 11:54:54 UTC)" skipped: maximum number of running instances reached (1)
WARNING:apscheduler.scheduler:Execution of job "update (trigger: interval[0:00:05], next run at: 2021-08-12 11:54:59 UTC)" skipped: maximum number of running instances reached (1)
[2021-08-12 11:55:03 +0000] [2003] [CRITICAL] WORKER TIMEOUT (pid:49612)
[2021-08-12 11:55:03 +0000] [49612] [INFO] Worker exiting (pid: 49612)
Failed to convert to GiB: Unknown
Traceback (most recent call last):
  File "/machinaris/common/utils/converters.py", line 21, in str_to_gibs
    val,unit = str.split(' ')
ValueError: not enough values to unpack (expected 2, got 1)

[2021-08-12 11:55:03 +0000] [50900] [INFO] Booting worker with pid: 50900
WARNING:apscheduler.executors.default:Run time of job "collect (trigger: cron[minute='*/5'], next run at: 2021-08-12 12:00:00 UTC)" was missed by 0:00:03.609948
WARNING:apscheduler.executors.default:Run time of job "update (trigger: interval[0:01:00], next run at: 2021-08-12 11:56:19 UTC)" was missed by 0:00:02.352730
WARNING:apscheduler.scheduler:Execution of job "update (trigger: interval[0:00:05], next run at: 2021-08-12 11:55:09 UTC)" skipped: maximum number of running instances reached (1)
WARNING:apscheduler.scheduler:Execution of job "update (trigger: interval[0:00:05], next run at: 2021-08-12 11:55:14 UTC)" skipped: maximum number of running instances reached (1)

System setup:

Kind regards, Bastiaan

BasvanH commented 3 years ago

I think there was a version incompatibility. All on 0.5.2 and no problems anymore.