threefoldtech / 0-db

Fast write ahead persistent redis protocol key-value store
Apache License 2.0
39 stars 10 forks source link

incremental.py stopped with 104, 'Connection reset by peer #164

Closed coesensbert closed 8 months ago

coesensbert commented 9 months ago

https://github.com/threefoldtech/0-db/blob/development-v2/tools/incremental-update/incremental.py

While syncing a second hub: https://github.com/threefoldtech/tf_operations/issues/2113#issuecomment-1894143384

root@Main-Grid-DE-hub-02 ~ # python3 incremental.py 
[+] authenticating
[+] master host: hub.grid.tf, port: 9900
[+] slave host: 172.17.0.2, port: 9900
[+] syncing namespace: default
[+] syncing: 42624.02 / 723512.45 MB (5.9 %) [request 166:189459600] Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/redis/connection.py", line 500, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 15, in read_response
    result = self._read_response(disable_decoding=disable_decoding)
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 25, in _read_response
    raw = self._buffer.readline()
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/socket.py", line 115, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/socket.py", line 65, in _read_from_socket
    data = self._sock.recv(socket_read_size)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/incremental.py", line 104, in <module>
    incremental.run()
  File "/root/incremental.py", line 69, in run
    nsmaster = self.master.execute_command("NSINFO", namespace)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 536, in execute_command
    return conn.retry.call_with_retry(
  File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 540, in <lambda>
    lambda error: self._disconnect_raise(conn, error),
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 526, in _disconnect_raise
    raise error
  File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 537, in <lambda>
    lambda: self._send_command_parse_response(
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 513, in _send_command_parse_response
    return self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 553, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.10/dist-packages/redis/connection.py", line 508, in read_response
    raise ConnectionError(
redis.exceptions.ConnectionError: Error while reading from hub.grid.tf:9900 : (104, 'Connection reset by peer')

Script stopped at around 01:30, looking at the hub metrics it looks like something happened around that time: https://mon.grid.tf/d/rYdddlPWk/node-exporter-full?orgId=1&var-DS_PROMETHEUS=default&var-job=node_exporter&var-node=prod-01:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B&from=now-24h&to=now&refresh=30s

redis there crashed and the script did not reconnect? After restarting the script it continued where it left off image

maxux commented 9 months ago

This is because zdb crashed yesterday, I restarted it this night. I found the reason of the crash and fixed already.

maxux commented 9 months ago

Fixed by https://github.com/threefoldtech/0-db/commit/e16efd22294e6fc79f91b581909013e000f47712

coesensbert commented 9 months ago

This time with a different stack trace

root@Main-Grid-DE-hub-02 ~ # python3 incremental.py 
[+] authenticating
[+] master host: hub.grid.tf, port: 9900
[+] slave host: 172.17.0.2, port: 9900
[+] syncing namespace: default
[+] syncing: 125949.10 / 723769.34 MB (17.4 %) [request 492:139666412] Traceback (most recent call last):
  File "/root/incremental.py", line 104, in <module>
    incremental.run()
  File "/root/incremental.py", line 97, in run
    self.sync(master, slave)
  File "/root/incremental.py", line 36, in sync
    raw = self.master.execute_command("DATA", "RAW", slave['dataid'], slave['offset'])
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 536, in execute_command
    return conn.retry.call_with_retry(
  File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 540, in <lambda>
    lambda error: self._disconnect_raise(conn, error),
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 526, in _disconnect_raise
    raise error
  File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 537, in <lambda>
    lambda: self._send_command_parse_response(
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 513, in _send_command_parse_response
    return self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 553, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.10/dist-packages/redis/connection.py", line 500, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 15, in read_response
    result = self._read_response(disable_decoding=disable_decoding)
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 25, in _read_response
    raw = self._buffer.readline()
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/socket.py", line 115, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/socket.py", line 68, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
maxux commented 9 months ago

I updated binary in production to latest release.