python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

delete() fails with version greater than 2**31-1 #752

Closed lpummer closed 4 months ago

lpummer commented 4 months ago

Expected Behavior

zk.delete() should accept any version returned from ZnodeStat of an existing node

Actual Behavior

It seems to accept version arguments only within the bounds of signed 32-bit integers. Zxid can be unsigned 64 bit so it should accept at least that range. Since this is in the protocol serialization, it probably impacts any version argument.

-> zk.delete(zparent + '/' + zchild, znode[1].last_modified_transaction_id, recursive=False)
(Pdb) p zparent + '/' + zchild
'/ledgers/00/1807'
(Pdb) p znode[1].last_modified_transaction_id
515414908868
(Pdb) n
Unhandled exception in connection loop
Traceback (most recent call last):
  File "/home/y/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 647, in _connect_attempt
    self._send_request(read_timeout, connect_timeout)
  File "/home/y/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 518, in _send_request
    self._submit(request, connect_timeout, xid)
  File "/home/y/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 329, in _submit
    b += request.serialize()
  File "/home/y/lib/python3.8/site-packages/kazoo/protocol/serialization.py", line 163, in serialize
    b.extend(int_struct.pack(self.version))
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/home/y/lib/python3.8/threading.py", line 932, in _bootstrap_inner
kazoo.exceptions.ConnectionLoss

Snippet to Reproduce the Problem

"""
I don't know how to directly set the mZxid of a node for debug/reproduction purposes, but forcing repeated leader elections will increment the higher bits of of the cluster Zxid and produce a value over 2**31.
[zk: localhost:2181(CONNECTED) 0] stat /ledgers/00/1807
cZxid = 0x78011f5fc4
ctime = Wed Feb 03 17:41:56 UTC 2016
mZxid = 0x78011f5fc4
mtime = Wed Feb 03 17:41:56 UTC 2016
pZxid = 0xaf65833868
cversion = 20000
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0
"""
zparent='/ledgers/00'
zchild='1807'
znode = zk.get(zparent + '/' + zchild)
zk.delete(zparent + '/' + zchild, znode[1].last_modified_transaction_id, recursive=False)

Logs with logging in DEBUG mode

N/A

Specifications

StephenSorriaux commented 4 months ago

Hello,

Thank you for the issue.

As far as I can tell, the protocol is correct, only a signed 32-bits integer can be used for the version of a znode one wants to delete. The zxid value can not be used as a parameter for delete(), instead you should use the version, which is dataVersion in your zkCli output.

lpummer commented 4 months ago

Yes, you're right. The implementation is correct. The ZooKeeper API is bad because you can zk.get version=1, another client can delete and replace the node with new contents now again having version=1, and then your delete on version=1 will not delete what you saw and intended.