threefoldtech / grid_deployment

Deploy a full Grid backend with docker-compose and snapshots
Apache License 2.0
2 stars 0 forks source link

[hub] incremental.py not working #62

Closed coesensbert closed 3 months ago

coesensbert commented 3 months ago

https://github.com/threefoldtech/grid_deployment/tree/development/grid-hub

once the hub is deployed we use incremental.py to sync all the data. The last one from here was used: https://github.com/threefoldtech/0-db/blob/development-v2/tools/incremental-update/incremental.py

We changed https://github.com/threefoldtech/grid_deployment/blob/development/grid-hub/incremental.py#L108

on deploying or manually trying to sync, we get

python3 incremental.py
root@bt-dc-test3:~/code/grid_deployment/grid-hub# python3 incremental.py
Traceback (most recent call last):
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 108, in <module>
    incremental = ZDBIncremental("hub.grid.tf", 9900, "127.0.0.1", 9900)
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 19, in __init__
    target.set_response_callback("NSINFO", redis._parsers.helpers.parse_info)
AttributeError: module 'redis' has no attribute '_parsers'

docker ps -a:

CONTAINER ID   IMAGE                                              COMMAND                  CREATED          STATUS          PORTS                                                                                         NAMES
e968e0b491ed   caddy:2.8.4                                        "caddy run --config …"   11 minutes ago   Up 11 minutes   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 443/udp, 2019/tcp   caddy
f47830bbb259   ghcr.io/threefoldtech/0-hub:master                 "python3 flist-uploa…"   11 minutes ago   Up 11 minutes                                                                                                 0-hub
ffdee2b4b5e5   ghcr.io/threefoldtech/0-bootstrap:development-v3   "python3 bootstrap.py"   11 minutes ago   Up 11 minutes                                                                                                 0-bootstrap
012c74997739   ghcr.io/threefoldtech/0-db:development-v2          "/bin/zdb-process"       11 minutes ago   Up 11 minutes   0.0.0.0:9900->9900/tcp, :::9900->9900/tcp                                                     0-db

ss -tlnpu

Netid                    State                     Recv-Q                    Send-Q                                        Local Address:Port                                          Peer Address:Port                    Process                                                           
udp                      UNCONN                    0                         0                                             127.0.0.53%lo:53                                                 0.0.0.0:*                        users:(("systemd-resolve",pid=2653034,fd=13))                    
udp                      UNCONN                    0                         0                                                         *:9650                                                     *:*                        users:(("mycelium",pid=1092653,fd=12))                           
udp                      UNCONN                    0                         0                                                         *:9651                                                     *:*                        users:(("mycelium",pid=1092653,fd=9))
tcp                      LISTEN                    0                         128                                                 0.0.0.0:34022                                              0.0.0.0:*                        users:(("sshd",pid=1899347,fd=3))                                
tcp                      LISTEN                    0                         4096                                                0.0.0.0:9900                                               0.0.0.0:*                        users:(("docker-proxy",pid=184768,fd=4))                         
tcp                      LISTEN                    0                         4096                                                0.0.0.0:80                                                 0.0.0.0:*                        users:(("docker-proxy",pid=185034,fd=4))                         
tcp                      LISTEN                    0                         4096                                          127.0.0.53%lo:53                                                 0.0.0.0:*                        users:(("systemd-resolve",pid=2653034,fd=14))                    
tcp                      LISTEN                    0                         4096                                                0.0.0.0:443                                                0.0.0.0:*                        users:(("docker-proxy",pid=185019,fd=4))                         
tcp                      LISTEN                    0                         1024                                              127.0.0.1:8989                                               0.0.0.0:*                        users:(("mycelium",pid=1092653,fd=14))                           
tcp                      LISTEN                    0                         128                                                    [::]:34022                                                 [::]:*                        users:(("sshd",pid=1899347,fd=4))
tcp                      LISTEN                    0                         4096                                                   [::]:9900                                                  [::]:*                        users:(("docker-proxy",pid=184775,fd=4))                         
tcp                      LISTEN                    0                         4096                                                   [::]:80                                                    [::]:*                        users:(("docker-proxy",pid=185041,fd=4))                         
tcp                      LISTEN                    0                         1024                                                      *:9651                                                     *:*                        users:(("mycelium",pid=1092653,fd=10))                           
tcp                      LISTEN                    0                         4096                                                   [::]:443                                                   [::]:*                        users:(("docker-proxy",pid=185027,fd=4))

telnet localhost 9900

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
maxux commented 3 months ago
AttributeError: module 'redis' has no attribute '_parsers'

This is caused by a version mismatch of python redis module. Internally I use a hack to make it working on zdb, but it seems they changed. Can you show me which version of python3-redis you're using ?

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

In order to debug a running zdb, don't use telnet but always use netcat. Best way to debug it is using redis-cli to see if it's responding.

$ redis-cli -h hub.grid.tf -p 9910
hub.grid.tf:9910> ping
PONG

If you don't have redis-cli available, nor netcat but only telnet, you can connect to zdb but you need to send something, zdb won't send you anything first. Easiest thing to send is just $:

# telnet hub.grid.tf 9910
Trying 185.69.166.9...
Connected to hub.grid.tf.
Escape character is '^]'.
$
-Malformed request, array expected
Connection closed by foreign host.

So you know it's responding.

maxux commented 3 months ago

If you really want to dig into the RESP protocol and get a real reply, using netcat you can try:

$ nc -v -C hub.grid.tf 9910
Connection to hub.grid.tf (185.69.166.9) 9910 port [tcp/*] succeeded!
*1
$4
INFO
$742
# server
server_name: 0-db (zdb)
server_revision: v2.0.5-7-g5c7aaa6c
engine_revision: v2.0.5-7-g5c7aaa6c
instance_id: 112345062
boot_time: 1705577553
uptime: 19087122

# clients
clients_lifetime: 1183

# internals
sequential_key_size: 8
data_version: 3
index_version: 4

# stats
commands_executed: 29
commands_failed: 3
commands_unauthorized: 0
index_disk_read_failed: 0
index_disk_write_failed: 0
data_disk_read_failed: 0
data_disk_write_failed: 0
index_disk_read_bytes: 0
index_disk_read_mb: 0.00
index_disk_write_bytes: 3567
index_disk_write_mb: 0.00
data_disk_read_bytes: 0
data_disk_read_mb: 0.00
data_disk_write_bytes: 28
data_disk_write_mb: 0.00
network_rx_bytes: 116127
network_rx_mb: 0.11
network_tx_bytes: 43420
network_tx_mb: 0.04

Sending *1 then $4 then INFO should reply server info.

coesensbert commented 3 months ago
Package: python3-redis
Version: 3.5.3-2
Priority: optional
Section: python
Source: python-redis
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Python Team <team+python@tracker.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 318 kB
Depends: python3:any
Suggests: python3-hiredis
Homepage: https://github.com/andymccurdy/redis-py
Download-Size: 60.8 kB
APT-Manual-Installed: yes
maxux commented 3 months ago

Okay, that's a really old version. Can you try removing it and install it via: pip3 install redis-py ?

coesensbert commented 3 months ago

Okay, that's a really old version. Can you try removing it and install it via: pip3 install redis-py ?

it's probably me but I'm not able to install it like that, tried couple of workarounds without result. What I find online is to up or downgrade the python version?

pip3 install redis-py
ERROR: Could not find a version that satisfies the requirement redis-py (from versions: none)
ERROR: No matching distribution found for redis-py
python3 --version
Python 3.10.12
maxux commented 3 months ago

My bad, try just pip3 install redis Package name is sometime different than repository name x_x

coesensbert commented 3 months ago

My bad, try just pip3 install redis Package name is sometime different than repository name x_x

right np, this works indeed! I'll try a full sync also added to install script: https://github.com/threefoldtech/grid_deployment/commit/1c307ce0bff70d35712dc962ef9051cbc7733011

coesensbert commented 3 months ago

while test syncing a hub deploy:

# python3 incremental.py
[+] authenticating: master
[+] master host: hub.grid.tf, port: 9900
[+] slave host: 127.0.0.1, port: 9900
[+] syncing namespaces: default -> default
[+] syncing: 87446.99 / 768399.90 MB (11.4 %) [request 341:260170008] Traceback (most recent call last):
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 111, in <module>
    incremental.run()
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 104, in run
    self.sync(master, slave)
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 52, in sync
    raise e
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 45, in sync
    raw = self.master.execute_command("DATA", "RAW", slave['dataid'], slave['offset'])
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 548, in execute_command
    return conn.retry.call_with_retry(
  File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 62, in call_with_retry
    return do()
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 549, in <lambda>
    lambda: self._send_command_parse_response(
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 525, in _send_command_parse_response
    return self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 565, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.10/dist-packages/redis/connection.py", line 536, in read_response
    raise response
redis.exceptions.ResponseError: Unexpected Internal Error
coesensbert commented 3 months ago

while test syncing a hub deploy:

# python3 incremental.py
[+] authenticating: master
[+] master host: hub.grid.tf, port: 9900
[+] slave host: 127.0.0.1, port: 9900
[+] syncing namespaces: default -> default
[+] syncing: 87446.99 / 768399.90 MB (11.4 %) [request 341:260170008] Traceback (most recent call last):
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 111, in <module>
    incremental.run()
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 104, in run
    self.sync(master, slave)
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 52, in sync
    raise e
  File "/root/code/grid_deployment/grid-hub/incremental.py", line 45, in sync
    raw = self.master.execute_command("DATA", "RAW", slave['dataid'], slave['offset'])
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 548, in execute_command
    return conn.retry.call_with_retry(
  File "/usr/local/lib/python3.10/dist-packages/redis/retry.py", line 62, in call_with_retry
    return do()
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 549, in <lambda>
    lambda: self._send_command_parse_response(
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 525, in _send_command_parse_response
    return self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.10/dist-packages/redis/client.py", line 565, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.10/dist-packages/redis/connection.py", line 536, in read_response
    raise response
redis.exceptions.ResponseError: Unexpected Internal Error

this was due to us uploading an flist to this new instance, which broke database state. We can only do Master-slave, more on that later.

coesensbert commented 3 months ago

success! a full sync

~/code/grid_deployment/grid-hub# python3 incremental.py
[+] authenticating: master
[+] master host: hub.grid.tf, port: 9900
[+] slave host: 127.0.0.1, port: 9900
[+] syncing namespaces: default -> default
[+] syncing: 768427.56 / 768427.60 MB (100.0 %), waiting changes

this sync is auto started on using the hub install script: https://github.com/threefoldtech/grid_deployment/blob/development/grid-hub/install-hub.sh#L63-L65