micro-fan / aiozk

Asyncio client for zookeeper
MIT License
49 stars 20 forks source link

AIOZK connection timeout when uploading data from multiple files longer than 460 lines #80

Open PremyslCerny opened 2 years ago

PremyslCerny commented 2 years ago

Description

AIOZK transaction using AIOZK client causes the following timeout exception after approx. 30 seconds, when there are multiple zookeeper nodes (i. e. files) committed with at least one larger than 450 lines separated by \n. The setting of both session and read timeouts in the AIOZK client has no impact.

Exception

Client:

30-Mar-2022 08:21:52.183376 ERROR aiozk.session Send exception: ('lm1', 12181)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiozk/session.py", line 217, in send
    zxid, response = await self.conn.send(request, xid=self.xid)
aiozk.exc.TimeoutError: ('lm1', 12181)
...
  File "/opt/lmiocmd/commander/library/service.py", line 228, in load_library
    await transaction.commit()
  File "/usr/local/lib/python3.7/site-packages/aiozk/transaction.py", line 117, in commit
    response = await self.client.send(self.request)
  File "/usr/local/lib/python3.7/site-packages/aiozk/client.py", line 101, in send
    response = await self.session.send(request)
  File "/usr/local/lib/python3.7/site-packages/aiozk/session.py", line 233, in send
    raise e
  File "/usr/local/lib/python3.7/site-packages/aiozk/session.py", line 217, in send
    zxid, response = await self.conn.send(request, xid=self.xid)
aiozk.exc.TimeoutError: ('lm1', 12181)

ZooKeeper Server Logs:

[2022-03-30 08:21:39,824] INFO Processing srvr command from /172.22.0.13:53912 (org.apache.zookeeper.server.NIOServerCnxn)
[2022-03-30 08:21:42,521] WARN Exception causing close of session 0x100d77f221515ee: Len error 1093084 (org.apache.zookeeper.server.NIOServerCnxn)
[2022-03-30 08:21:52,155] INFO Processing srvr command from /172.22.0.13:54014 (org.apache.zookeeper.server.NIOServerCnxn)
[2022-03-30 08:21:52,166] INFO Revalidating client: 0x100d77f221515ee (org.apache.zookeeper.server.quorum.Learner)
[2022-03-30 08:21:52,285] WARN Exception causing close of session 0x100d77f221515ee: Connection reset by peer (org.apache.zookeeper.server.NIOServerCnxn)

How to reproduce

Use aiozk transaction in a following way to upload data read from files (FILE_DATA and FILE_PATH):

zookeeper_path = "/{}".format(FILE_PATH)
transaction = aiozk.transaction.Transaction(client_object)
transaction.create(zookeeper_path, FILE_DATA)
...

There should be at least 660 files with at least one exceeding approx. 460 lines added to to the transaction.

Afterwards the transaction is to be committed:

await transaction.commit()

The exception is caused by this await transaction.commit() line.

Environment

ZooKeeper Cluster with 3 nodes running in Docker Container:

  zookeeper:
    restart: on-failure:3
    image: confluentinc/cp-zookeeper:5.5.1
    network_mode: host
    environment:
      ZOOKEEPER_SERVER_ID: 1
      ZOOKEEPER_CLIENT_PORT: 12181
      ZOOKEEPER_TICK_TIME: 2000
      ZOOKEEPER_INIT_LIMIT: 5
      ZOOKEEPER_SYNC_LIMIT: 2
      ZOOKEEPER_MAX_CLIENT_CNXNS: 0
      ZOOKEEPER_SERVERS: lm1:12888:13888;lm2:22888:23888;lm3:32888:33888
    volumes:
      - /data/ssd/zookeeper/data:/var/lib/zookeeper/data
      - /data/ssd/zookeeper/logs:/var/lib/zookeeper/log

Memory:

MemTotal:       264009620 kB
MemFree:         4479308 kB
MemAvailable:   37704100 kB

CPU:

64 cores

processor   : 63
vendor_id   : AuthenticAMD
cpu family  : 23
model       : 1
model name  : AMD EPYC 7551P 32-Core Processor
stepping    : 2
microcode   : 0x8001250
cpu MHz     : 2549.852
cache size  : 512 KB
PremyslCerny commented 2 years ago

Note: When the long file is removed, the issue dissappears.