Closed coesensbert closed 8 months ago
This is because zdb crashed yesterday, I restarted it this night. I found the reason of the crash and fixed already.
This time with a different stack trace
root@Main-Grid-DE-hub-02 ~ # python3 incremental.py
[+] authenticating
[+] master host: hub.grid.tf, port: 9900
[+] slave host: 172.17.0.2, port: 9900
[+] syncing namespace: default
[+] syncing: 125949.10 / 723769.34 MB (17.4 %) [request 492:139666412] Traceback (most recent call last):
File "/root/incremental.py", line 104, in <module>
incremental.run()
File "/root/incremental.py", line 97, in run
self.sync(master, slave)
File "/root/incremental.py", line 36, in sync
raw = self.master.execute_command("DATA", "RAW", slave['dataid'], slave['offset'])
File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 536, in execute_command
return conn.retry.call_with_retry(
File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 49, in call_with_retry
fail(error)
File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 540, in <lambda>
lambda error: self._disconnect_raise(conn, error),
File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 526, in _disconnect_raise
raise error
File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 537, in <lambda>
lambda: self._send_command_parse_response(
File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 513, in _send_command_parse_response
return self.parse_response(conn, command_name, **options)
File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 553, in parse_response
response = connection.read_response()
File "/usr/local/lib/python3.10/dist-packages/redis/connection.py", line 500, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 15, in read_response
result = self._read_response(disable_decoding=disable_decoding)
File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 25, in _read_response
raw = self._buffer.readline()
File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/socket.py", line 115, in readline
self._read_from_socket()
File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/socket.py", line 68, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
I updated binary in production to latest release.
https://github.com/threefoldtech/0-db/blob/development-v2/tools/incremental-update/incremental.py
While syncing a second hub: https://github.com/threefoldtech/tf_operations/issues/2113#issuecomment-1894143384
Script stopped at around 01:30, looking at the hub metrics it looks like something happened around that time: https://mon.grid.tf/d/rYdddlPWk/node-exporter-full?orgId=1&var-DS_PROMETHEUS=default&var-job=node_exporter&var-node=prod-01:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B&from=now-24h&to=now&refresh=30s
redis there crashed and the script did not reconnect? After restarting the script it continued where it left off